AWK(1) DOMAIN/IX Reference Manual (SYS5) AWK(1)
NAME
awk - pattern scanning and processing language
USAGE
awk [ -Fc ] [ prog ] [ parameters ] [ files ]
DESCRIPTION
Awk scans input files for lines that match any of a set of
patterns specified in prog. With each pattern in prog, it
performs an associated action when a line matches the pat-
tern. The set of patterns may appear literally as prog, or
in a file specified as -f file. Enclose the prog string in
single quotes (` ') to protect it from the Shell.
Parameters, in the form x=... y=... etc., may be passed to
awk.
Awk reads files in the order in which they appear on the
command line. If you do not specify any files or if you use
a dash (-) in place of any filenames, it reads the standard
input. Each line is matched against the pattern portion of
every pattern-action statement; the associated action is
performed for each matched pattern.
An input line is composed of fields separated by white
space. This default can be changed by using the FS variable
name (see below). The fields are denoted $1, $2, ...; $0
refers to the entire line.
A pattern-action statement has the following form:
pattern { action }
A missing action means print the line; a missing pattern
always matches. An action is a sequence of statements. A
statement is one of the following:
if ( conditional ) statement [ else statement ]
while ( conditional ) statement
for ( expression ; conditional ; expression ) statement
break
continue
{ [ statement ] ... }
variable = expression
print [ expression-list ] [ >expression ]
printf format [ , expression-list ] [ >expression ]
next # skip remaining patterns on this input line
exit # skip the rest of the input
Statements are terminated by semicolons, newlines, or right
braces. An empty expression-list stands for the whole line.
Expressions take on string or numeric values as appropriate,
Printed 6/10/85 AWK-1
AWK(1) DOMAIN/IX Reference Manual (SYS5) AWK(1)
and are built using the operators +, -, *, /, %, and con-
catenation (indicated by a blank). The following C opera-
tors are also valid in expressions: ++, --, +=, -=, *=, /=,
and %=. Variables may be scalars, array elements (denoted
by x[i]), or fields. Variables are initialized to the null
string. Array subscripts may be any string, not necessarily
numeric; this allows for a form of associative memory.
String constants are placed in double quotes (`` '').
The print statement prints its arguments on the standard
output (or on a file if >expr is present), separated by the
current output field separator, and terminated by the output
record separator. The printf statement formats its expres-
sion list according to the printf (3S) format.
The built-in function called length returns the length of
its argument taken as a string, or of the whole line if no
argument exists. There are also built-in functions known as
exp, log, sqrt, and int. The last function truncates its
argument to an integer; substr( s,`m,`n) returns the n-
character substring of s that begins at position m. The
function sprintf(fmt,`expr,`expr, `...) formats the expres-
sions according to the printf (3S) format given by fmt and
returns the resulting string.
Patterns are arbitrary Boolean combinations ( !, ||, &&, and
parentheses) of regular and relational expressions. A pat-
tern may consist of two patterns separated by a comma; in
this case, the action is performed for all lines between an
occurrence of the first pattern and the next occurrence of
the second.
Regular expressions must be surrounded by slashes and appear
as in egrep, which appears under grep (1). Isolated regular
expressions in a pattern apply to the entire line. Regular
expressions may also occur in relational expressions.
A relational expression is one of the following:
expression matchop regular-expression
expression relop expression
where a relop is any of the six relational operators in C,
and a matchop is either a tilde (~) for contains, or an exc-
lamation point and a tilde (!~) for does not contain. A
conditional is an arithmetic expression, a relational
expression, or a Boolean combination of these.
The special patterns BEGIN and END may be used to capture
control before the first input line is read and after the
last. BEGIN must be the first pattern; END must be the
last.
AWK-2 Printed 6/10/85
AWK(1) DOMAIN/IX Reference Manual (SYS5) AWK(1)
A single character c may be used to separate the fields by
starting the program with:
BEGIN { FS = c }
or by using the -Fc option.
Other variable names with special meanings include NF, the
number of fields in the current record; NR, the ordinal
number of the current record; FILENAME, the name of the
current input file; OFS, the output field separator (default
blank); ORS, the output record separator (default newline);
and OFMT, the output format for numbers (default %.6g ).
EXAMPLES
To print lines longer than 72 characters, specify the fol-
lowing:
length > 72
Printed 6/10/85 AWK-3
AWK(1) DOMAIN/IX Reference Manual (SYS5) AWK(1)
To print the first two fields in opposite order, use this:
{ print $2, $1 }
To add the first column, and then print the sum and average,
specify this:
{ s += $1 }
END { print "sum is", s, " average is", s/NR }
To print fields in reverse order, specify the following:
{ for (i = NF; i > 0; --i) print $i }
To print all lines between start/stop pairs, use this:
/start/, /stop/
To print all lines whose first field is different from the
previous one, use this:
$1 != prev { print; prev = $1 }
To print the file, filling in page numbers starting at 5,
specify the following:
/Page/ { $2 = n++; }
{ print }
command line: awk -f program n=5 input
CAUTIONS
Input white space is not preserved on output if fields are
involved.
No explicit conversions exist between numbers and strings.
To force an expression to be treated as a number, add zero
to it. To force it to be treated as a string, concatenate
the null string (`` '') to it.
RELATED INFORMATION
grep (1), lex (1), sed (1), malloc (3C), printf (3S).
AWK-4 Printed 6/10/85