See also Sed Command and Grep Command

Awk is a very powerfull line editor filter or small programming language available in all unix style operating systems. This page contains small tutorial awk scripts and snippets.

Info[-][--][++]

<!----------------------------------------------------------------------------->

  1. Columns are automatically assigned $1 - $n. And $0 is the entire line
  2. Awk has some built in functions:
    1. gsub(r,s) globally replaces r with s within the line ($0)
    2. index(s,t) returns first position of string t in s (or 0 if not present)
    3. length(s) returns the number of characters in s
    4. match(s,r) tests weather s contains a sub-string matched with r
    5. split(s,a,fs) splits string s into array a using field separator fs
    6. substr(s,p,n) returns sub-string of s of length n starting at position p
    7. Others: sin(), cos(), exp(), sqrt(), rand()...
  3. Awk has some built in variables
    1. NF is number of fields (num of columns in that row)
    2. NR is the current line number
    3. FNR is the current line number for the current file (if using awk on multiple files)
    4. FS set this for field separator (defaults to space)
    5. RS is the record separator (so row separator, defaults to new line)

One Liners[-][--][++]

<!----------------------------------------------------------------------------->

  1. Split a CSV into individual files that share the same column!
    1. awk -F, '{output=$3".csv"; print $0 > output}' input.csv
  2. Print columns 1 and 5 of lines beginning with /dev from the df command
    1. df | awk '/^\/dev/{ print $1 ": " $5 }'
    2. You can see we are are filtering the output first (lines beginning with /dev), then acting on those lines. So pattern and action. By only using pattern we essentially have the grep command. Like cat /etc/passwd | grep pulse is the same as awk '/pulse/' /etc/passwd
  3. Print entire lines in /etc/passwd where userid >= 500
    1. awk -F: '$3 >= 500' /etc/passwd
    2. the -F: means we are setting the field separator to a :
  4. Print only the names from /etc/passwd
    1. awk -F: '{ print $1 }' /etc/passwd
  5. Add line numbers to output
    1. awk '{ print NR, $0 }' /etc/passwd
    2. Here the , after the NR is used as a space, if you wanted xx-row (instead of xx row) you could use awk '{ print NR-$0 }' /etc/passwd
    3. Remember $0 is entire line, we could just print the first column (the usernames) with line number like this awk -F: '{ print NR, $1 }' /etc/passwd
  6. Print every 10th line
    1. awk 'NR%10 == 0' /etc/passwd (or %2) for every other line
  7. Reverse the order of input (like reverse sort)
    1. awk '{ s = $0 "\n" s } END { print s }' /etc/passwd
  8. Center the output at column 40
    1. awk '{ printf "%" int(40+length($0)/2) "s\n", $0 }' /etc/passwd
  9. Print non blank lines
    1. awk 'NF' /etc/passwd
  10. Print lines longer than 80 chars
    1. awk 'length($0) > 80' /etc/passwd
  11. This replaces tty with TERMINAL only on the first line of output, all other lines are still displayed, but not altered
    1. who | awk 'NR==1 { gsub("tty", "TERMINAL"); print } NR!=1'
  12. Count words in a file
  13. awk '{ total = total + NF }; END {print total}' /etc/passwd

Scripts[-][--][++]

<!----------------------------------------------------------------------------->

Show the highest userid number from /etc/passwd[-][--][++]

Items on the BEGIN line happen before any lines are processed, so we set the standard FS (field separator) character to a :, then set a maxuid variable to 0. Then if the 3rd column (the uuid in /etc/passwd) is greater that maxuid set it to the new maxuid and set the maxname variable to the lines username which is column 1. Items on the END line happen after all lines are processed.

maxuid.awk
#!/usr/bin/awk -f
BEGIN { FS = ":"; maxuid = 0 }
$3 > maxuid { maxuid = $3; maxname = $1 }
END { print maxname ": " maxuid }

Since we added the awk shebang in the file, we can just run it with ./maxuid.awk /etc/passwd, if we hadn't added the shebang we could run it with awk like awk -f maxuid.awk /etc/passwd

Treat multi lines as one line[-][--][++]

Awk usually works on one line, but what if we have data like this:

/tmp/test.txt
Michael
Jackson
555-5551

Kevin
Jones
555-5552

We want to treat 'Michael Jackson 555-5551' as one line etc... To do this we alter the FS field separator character and the RS record separator character. So the FS should be a new line (\n) and RS should be an empty line (="")

To display

Michael 555-5551
Kevin 555-5552

we use awk 'BEGIN { RS = ""; FS = "\n" } { print $1,$3 }' /tmp/test.txt
or only display Kevins use awk 'BEGIN { RS = ""; FS = "\n" } $2 == "Smith" { print $1,$3 }' /tmp/test.txt

Get each process time in seconds[-][--][++]

ps -ef shows processes and how long they have been running in the hh:mm:ss format, lets print just the second column (the PID) and the running time but convert time into seconds

seconds.awk
{ split($7, hms, ":")
    secs = (hms[1] * 3600) + (hms[2] * 60) + hms[3]
    printf "%6d %5d\n", $2, secs
}

And run it with ps -ef | awk -f seconds.awk. You can see we have hms which is an array variable, and secs an integer variable. The printf function just prints in a nice tabed style

Get number of processes and total memory usage per user[-][--][++]

Notice that count is not a built in function it's an array.

totalmem.awk
$1 != "USER" { count[$1]++; tot[$1] += $6 }
END {
    for (user in tot)
        printf "%8s: %4d %8d\n", user, count[user], tot[user]
}

And run it with ps aux | awk -f totalmem.awk

Haproxy Speed Filter[-][--][++]

Awk for haproxy log, breaks out into columns by date, host, status, speed, size and page

#!/usr/bin/awk -f

BEGIN {
        FS = " "

        # Output as CSV
        csv=0
}

{
        client=$6
        date=$7
        time=substr(date, 14, 12)
        backend=$9
        split($10, timers, "/")
        tt=timers[4]
        status=$11
        size=$12 / 1024 #in kb
        termination=$15
        split($16, conns, "/")
        host=substr($18, 2, length($18)-2)
        subdomain=substr(host, 1, index(host, ".")-1)
        request=$20
        page=getPage(request)

        #out(date, 26)
        out(time, 12)
        #out(substr(host, 1, 30), 30)
        if (csv == 1)
                out(subdomain, 20)
        else
                out(substr(subdomain, 1, 20), 20)
        out(status, 3)
        #out(client, 21)
        out("["termination"]", 4)
        out(size, 6.1, "f")
        out(tt, 5, "d")
        out(page)

        printf("\n")

}

function out(data, pad, type) {
        if (type == "") type = "s"
        if (csv == 1)
                printf("%s", "\""data"\",")
        else
                printf("%-"pad""type"  ", data)
}

function getPage(request) {
        if (index(request, "?") > 0) 
                return substr(request, 1, index(request, "?")-1)
        else
                return request

}

Csv Parser for Dynacomm Logs[-][--][++]

#!/usr/bin/gawk -f

BEGIN {
        FS = "\",\""
}

{
        date = substr($1, 2, length($1)-1)
        user = $2
        session = $3
        socket = $4
        ip = $5
        message = substr($6, 1, length($6)-1)

        lines ++;
        if (match(message, "Connected using")) {
                # User connected
                users[user]["connected"] = 1

                if (match(message, "websocket transport")) {
                        users[user]["transport"] = "websocket"
                } else {
                        users[user]["transport"] = "xhr"
                }

        } else if (match(message, "Disconnected")) {
                # User disconnected
                users[user]["connected"] = 0

        } else if (match(message, "Sending message")) {
                sentMessages ++;
                recipients[user] = 1;

        } else if (match(message, "invalid username or password")) {
                unauthorized ++;

        } else if (match($0, "socket.io started")) {
                restarts ++;

        } else if (match(message, "Error:")) {
                errors["errors"] ++;

                if (match(message, "data returned from SSO query not valid JSON")) {
                        errors["sso-not-json"] ++;

                } else if (match(message, "could not http")) {
                        errors["dns"] ++;

                } else if (match(message, "EMFILE")) {
                        errors["emfile"] ++;

                } else if (match(message, "ENOTFOUND")) {
                        errors["enotfound"] ++;
                }
        }

}

END {

        # users is now a distinct associative array of information
        for (u in users) {
                if (users[u]["connected"] == 1) {
                        connectedUsers ++;
                        if (users[u]["transport"] == "websocket") {
                                websocketUsers ++;
                        } else {
                                xhrUsers ++;
                        }
                } else {
                        disconnectedUsers ++;
                }
        }
        print("DynaComm Log Information");
        print("------------------------");
        printf("      Sent Messages: %d\n", sentMessages);
        printf("Distinct Recipients: %d\n", length(recipients));
        printf("    Connected Users: %d\n", connectedUsers);
        printf("    Websocket Users: %d\n", websocketUsers);
        printf("  Xhr-Polling Users: %d\n", xhrUsers);
        printf(" Disconnected Users: %d\n", disconnectedUsers);
        printf("       Unauthorized: %d\n", unauthorized);
        printf("          Log Lines: %d\n", lines);

        print("\n\nDynacomm Log Issues");
        print("------------------------");
        printf("              Total: %d\n", errors["errors"]);
        printf("       SSO Not JSON: %d\n", errors["sso-not-json"]);
        printf("            SSO DNS: %d\n", errors["dns"]);
        printf("             EMFILE: %d\n", errors["emfile"]);
        printf("          ENOTFOUND: %d\n", errors["enotfound"]);
        printf("           Restarts: %d\n", restarts);
}

Resources[-][--][++]

<!----------------------------------------------------------------------------->