Awk is a data extraction and reporting tool developed in the 1970s at Bell Labs to perform basic text formatting on input streams and files, using a scripting language to take actions on textual data and produce formatted reports. It parses input into fields separated by whitespace that can then be accessed and manipulated through variables like $1, $2, NF, and NR to extract and report on specific fields. Advanced uses include conditioning printing on field values, manipulating field values through pipes and sed, and accumulating calculations with variables over the entire file.
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Unix - Class7 - awk
1. UNIX - awk
Data extraction and
formatted Reporting Tool
Presentation By
Nihar R Paital
2. Introduction
Developer : Alfred Aho
Peter Weinberger
Brian Kernighan
Appears in : Version 7 UNIX onwards
Developed during : 1970 s
Developed at : Bell Labs
Category : UNIX Utility
Supported by : All UNIX flavors Nihar R Paital
3. Definition
The AWK utility is a data extraction and
reporting tool that uses a data-driven
scripting language consisting of a set of
actions to be taken against textual data
(either in files or data streams) for the
purpose of producing formatted reports.
Nihar R Paital
4. It performs basic text formatting on an input
stream ( A file / input from a pipeline )
Formatting using input file
$ awk {print $n} Filename
Example:
$ awk {print $1} awk.txt > awk.txt.bak
Formatting using a filter in a pipeline
$ generate_data | awk {print $1}
Example:
$ cat awk.txt | awk {print $1} > awk.txt.bak
Before proceeding to next slide please create a file named awk.txt with following Contents.
07.46.199.184 [28/Sep/2010:04:08:20] "GET /robots.txt HTTP/1.1" 200 0 "msnbot"
123.125.71.19 [28/Sep/2010:04:20:11] "GET / HTTP/1.1" 304 - "Baiduspider"
Nihar R Paital
5. Basic but important for awk
Syntax :
awk {print $n} filename
Generate data : awk {print $n}
Awk programs will start with a "{" and end with a "}"
$0 is the entire line
Awk parses the line in to fields for you automatically, using any whitespace
(space, tab) as a delimiter.
Fields of a regular file will be available using $1,$2,$3 … etc
NF : It is a special Variable contains the number of fields in the current line. We
can print the last field by printing the field $NF
NR : It prints the row number being currently processed. Nihar R Paital
6. Basic Examples
$ awk '{print $0}' awk.txt
It will print all the lines as they are in File
$ echo 'this is a test' | awk '{print $3}'
It will print 'a'
$ echo 'this is a test' | awk '{print $NF}'
It prints "test"
$ awk '{print $1, $(NF-2) }' awk.txt
It will print the last 3rd word of file awk.txt
$ awk '{print NR ") " $1 " -> " $(NF-2)}‘
Output:
1) 07.46.199.184 -> 200
2) 123.125.71.19 -> 304
Nihar R Paital
7. Advance use of AWK
$ awk '{print $2}' logs.txt
Output:
[28/Sep/2010:04:08:20]
[28/Sep/2010:04:20:11]
The date field is separated by "/" and ":" characters.
Suppose I want to print like
[28/Sep/2010
[28/Sep/2010
$ awk '{print $2}' logs.txt | awk 'BEGIN{FS=":"}{print $1}'
Output:
[28/Sep/2010
[28/Sep/2010
Here FS=“:” means Field Separator as colon(:)
$ awk '{print $2}' logs.txt | awk 'BEGIN{FS=":"}{print $1}' | sed 's/[//'
Output:
28/Sep/2010
28/Sep/2010
Here We are Substituting [ with NULL value Nihar R Paital
8. Advance Use of AWK
If I want to return only the 200 status lines
$ awk '{if ($(NF-2) == "200") {print $0}}' logs.txt
Output:
07.46.199.184 [28/Sep/2010:04:08:20] "GET /robots.txt HTTP/1.1" 200 0 "msnbot"
$ awk '{a+=$(NF-2); print "Total so far:", a}' logs.txt
Output:
Total so far: 200
Total so far: 504
$ awk '{a+=$(NF-2)}END{print "Total:", a}' logs.txt
Output:
Total: 504
Nihar R Paital