Museum

Home

Lab Overview

Retrotechnology Articles

⇒ Online Manual

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

yacc(1)

sed(1V)

LEX(1)  —  USER COMMANDS

NAME

lex − generator of lexical analysis programs

SYNOPSIS

lex [ −tvfn ] [ filename ] ...

DESCRIPTION

lex generates programs to be used in simple lexical analyis of text.  Each filename (standard input by default) contains regular expressions to search for, and actions written in C to be executed when expressions are found. 

A C source program, lex.yy.c is generated, to be compiled as follows:

cc lex.yy.c −ll

This program, when run, copies unrecognized portions of the input to the output, and executes the associated C action for each regular expression that is recognized.  The actual string matched is left in yytext, an external character array.

Matching is done in order of the strings in the file.  The strings may contain square braces to indicate character classes, as in [abx−z] to indicate a, b, x, y, and z; and the operators ∗, + and ?, which mean, respectively, any nonnegative number, any positive number, or either zero or one occurrences of the previous character or character-class.  The "dot" character (.) is the class of all ASCII characters except NEWLINE. 

Parentheses for grouping and vertical bar for alternation are also supported.  The notation r{d,e} in a rule indicates instances of regular expression r. between d and e.  It has a higher precedence than |, but lower than that of ∗, ?, +, or concatenation. The carat character (^) at the beginning of an expression permits a successful match only immediately after a NEWLINE, and the character $ at the end of an expression requires a trailing NEWLINE. 

The character / in an expression indicates trailing context; only the part of the expression up to the slash is returned in yytext, although the remainder of the expression must follow in the input stream.

An operator character may be used as an ordinary symbol if it is within " symbols or preceded by \. 

Three subroutines defined as macros are expected: input() to read a character; unput(c) to replace a character read; and output(c) to place an output character.  They are defined in terms of the standard streams, but you can override them.  The program generated is named yylex(), and the library contains a main() which calls it.  The action REJECT on the right side of the rule causes this match to be rejected and the next suitable match executed; the function yymore() accumulates additional characters into the same yytext; and the function yyless(p) pushes back the portion of the string matched beginning at p, which should be between yytext and yytext+yyleng. The macros input and output use files yyin and yyout to read from and write to, defaulted to stdin and stdout, respectively. 

In a lex program, any line beginning with a blank is assumed to contain only C text and is copied; if it precedes %% it is copied into the external definition area of the lex.yy.c file.  All rules should follow a %%, as in YACC.  Lines preceding %% which begin with a nonblank character define the string on the left to be the remainder of the line; it can be used later by surrounding it with {}.  Note that curly brackets do not imply parentheses; only string substitution is done. 

The external names generated by lex all begin with the prefix yy or YY. 

Certain table sizes for the resulting finite-state machine can be set in the definitions section:

%p n
number of positions is n (default 2000)

%n n
number of states is n (500)

%t n
number of parse tree nodes is n (1000)

%a n
number of transitions is n (3000)

The use of one or more of the above automatically implies the −v option, unless the −n option is used. 

OPTIONS

−t Place the result on the standard output instead of in file lex.yy.c. 

−v Print a one-line summary of statistics of the generated analyzer. 

−n Opposite of −v; −n is default. 

−f ‘Faster’ compilation: don’t bother to pack the resulting tables; limited to small programs. 

EXAMPLES

lex lexcommands

would draw lex instructions from the file lexcommands, and place the output in lex.yy.c. 

%%
[A−Z]putchar(yytext[0]+´a´−´A´);
[ ]+$;
[ ]+putchar(´ ´);

is an example of a lex program. It converts upper case to lower, removes blanks at the end of lines, and replaces multiple blanks by single blanks.

D[0−9]
%%
ifprintf("IF statement\n");
[a−z]+printf("tag, value %s\n",yytext);
0{D}+printf("octal number %s\n",yytext);
{D}+printf("decimal number %s\n",yytext);
"++"printf("unary op\n");
"+"printf("binary op\n");
"/∗"{loop:
while (input() != ′∗′);
switch (input())
{
case ′/′: break;
case ′∗′: unput(′∗′);
default: go to loop;
}
}

SEE ALSO

yacc(1), sed(1V)

Lex − A Lexical Analyzer Generator, in Programming Utilities for the Sun Workstation. 

Sun Release 3.5  —  Last change: 23 July 1986

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026