REGEX(S) UNIX System V REGEX(S)
Name
regex, regcmp - compiles and executes regular expressions.
Syntax
char *regcmp(string1[,string2, ...],(char *)0);
char *string1, *string2, ...;
char *regex(re,subject[,ret0, ...]);
char *re, *subject, *ret0, ...;
extern char * __loc1;
Description
The regcmp routine compiles a regular expression and returns
a pointer to the compiled form. The malloc(S) routine is
used to create space for the vector. It is the user's
responsibility to free unneeded space so allocated. A zero
return from regcmp indicates an incorrect argument.
regcmp(CP) has been written to generally preclude the need
for this routine at execution time.
The regex routine executes a compiled pattern against the
subject string. Additional arguments are passed to receive
values back. regex returns zero on failure or a pointer to
the next unmatched character on success. A global character
pointer __loc1 points to where the match began. regcmp and
regex were derived from the editor, ed(C): however, the
syntax and semantics have been changed slightly. The
following are the valid symbols and their associated
meanings.
[]*.^ These symbols retain their current meaning.
$ Matches the end of the string.
\n matches the newline.
- Within brackets the minus means through. For
example, [a-z] is equivalent to [abcd...xyz]. The
- can appear as itself only if used as the last or
first character. For example, the character class
expression []-] matches the characters ] and -.
+ A regular expression followed by + means ``one or
more times''. For example, [0-9]+ is equivalent
to [0-9][0-9]*.
{m} {m,} {m,u}
Integer values enclosed in {} indicate the number
of times the preceding regular expression is to be
applied. m is the minimum number and u is a
number, less than 256, which is the maximum. If
only m is present (e.g., {m}), it indicates the
exact number of times the regular expression is to
be applied. {m,} is analogous to {m,infinity}.
The plus (+) and star (*) operations are
equivalent to {1,} and {0,} respectively.
( ... )$n The value of the enclosed regular expression is to
be returned. The value will be stored in the
(n+1)th argument following the subject argument.
At present, at most ten enclosed regular
expressions are allowed. regex makes its
assignments unconditionally.
( ... ) Parentheses are used for grouping. An operator,
e.g. *, +, {}, can work on a single character or
a regular expression enclosed in parenthesis. For
example, (a*(cb+)*)$0.
By necessity, all the above defined symbols are special.
They must, therefore, be escaped to be used as themselves.
See Also
ed(C), regcmp(CP), free(S), malloc(S)
Examples
Example 1:
char *cursor, *newcursor, *ptr;
...
newcursor = regex((ptr=regcmp("^\n",(char*)0)),cursor);
free(ptr);
This example will match a leading newline in the subject
string pointed at by cursor.
Example 2:
char ret0[9];
char *newcursor, *name;
...
name = regcmp("([A-Za-z][A-Za-z0-9]{0,7})$0",(char*)0);
newcursor = regex(name,"123Testing321",ret0);
This example will match through the string Testing3 and will
return the address of the character after the last matched
character (cursor+11). The string Testing3 will be copied
to the character array ret0.
Example 3:
#include "file.i"
char *string, *newcursor;
...
newcursor = regex(name,string);
This example applies a precompiled regular expression in
file.i (see regcmp(CP)) against string.
Example 4:
char *ptr, *newcursor;
ptr = regcmp("[a-[=i=][:digit:]]*",(char*)0);
newcursor = regex(ptr, "123CHICO321");
It is assumed in this example that the current locale's
collation rules specify the following sequence -
A, a, B, b, C, c, CH, Ch, ch, D, d, E, e, F, f, G, g,
H, h, I, i.....
The characters I and i are also both in the same ``primary''
collation group.
The following characters are all members of the digit ctype
class -
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
This example will match through the string ``123CHIC'' and
return the address of the character ``O'' in the string.
Notes
The user program may run out of memory if regcmp is called
iteratively without freeing the vectors that are no longer
required. The following user-supplied replacement for
malloc(S) reuses the same vector saving time and space:
/* user's program */
...
malloc(n)
{
static int rebuf[256];
return &rebuf;
}
(printed 6/20/89)