regexp(3) — Subroutines
OSF
NAME
advance, compile, step − Regular-expression compile and match routines
SYNOPSIS
#define INIT declarations
#define GETC getc code
#define PEEK peek code
#define UNGETC(c) ungetc code
#define RETURN(ptr) return code
#define ERROR(val) error code #include <regexp.h> char ∗compile(
char ∗instring,
char ∗expbuf,
char ∗endbuf,
int eof) ; int step(
char ∗string,
char ∗expbuf) ; int advance(
char ∗string,
char ∗expbuf ); extern char ∗loc1, ∗loc2, ∗locs ;
PARAMETERS
instringSpecifies a string to be passed to the compile() function. The instring parameter is never used explicitly by the compile() function, but you can use it in your macros. For example, you may want to pass the string containing a pattern as the instring parameter to the compile() function and use the INIT() macro to set a pointer to the beginning of this string. When your macros do not use instring, call the compile() function with a value of ((char ∗) 0) for this parameter.
expbufPoints to a character array where the compiled regular expression is stored.
endbufPoints to the location that immediately follows the character array where the compiled regular expression is stored. When the compiled expression cannot be contained in (endbuf-expbuf) number of bytes, a call to the ERROR(50) macro is made.
eofSpecifies the character that marks the end of the regular expression. For example, in ed this character is usually / (slash).
stringPoints to a null-terminated string of characters in the step() function, to be searched for a match.
DESCRIPTION
The compile(), advance(), and step() functions are used for general-purpose expression-matching.
The compile() function takes a simple regular expression as input and produces a compiled expression that can be used with the step() and advance() functions.
The following six macros, used in the compile() function, must be defined before the #include <regexp.h> statement in programs. The GETC(), PEEKC(), and UNGETC() macros operate on the regular expression provided as input for the compile() function.
INIT()The INIT() macro is used for dependent declarations and initializations. In the regexp.h header file this macro is located right after the compile() function declarations and opening { (left brace). Your INIT() declarations must end with a ; (semicolon).
The INIT() macro is frequently used to set a register variable to point to the beginning of the regular expression so that this pointer can be used in declarations for GETC(), PEEKC(), and UNGETC(). Alternatively, you can use INIT() to declare external variables that GETC(), PEEKC(), and UNGETC() need.
GETC()The GETC() macro returns the value of the next character (byte) in the regular-expression pattern. Successive calls to GETC() return successive characters of the regular expression.
PEEKC()The PEEKC() macro returns the next character (byte) in the regular expression. Immediate subsequent calls to this macro return the same byte, which is also the next character returned by the GETC() macro.
UNGETC(c)
The UNGETC() macro causes the c parameter to be returned by the next call to the GETC() and PEEKC() macros. No more than one character of pushback is ever needed because this character is guaranteed to be the last character read by the GETC() macro. The value of the UNGETC() macro is always ignored.
RETURN(ptr)
The RETURN() macro is used for normal exit of the compile() function. The value of the ptr parameter is a pointer to the character following the last character of the compiled regular expression. This is useful in programs that manage memory allocation.
ERROR(val)
The ERROR() macro is the abnormal return from the compile() function. A call to this macro should never return a value. In this macro, val is an error number, which is described in the ERRORS section of this reference page.
The step() function finds the first substring of the string parameter that matches the compiled expression pointed to by the expbuf parameter. When there is no match, the step() function returns 0 (zero). When there is a match, the step() function returns a nonzero value and sets two global character pointers: loc1, which points to the first character of the substring that matches the pattern, and loc2, which points to the character immediately following the substring that matches the pattern. When the regular expression matches the entire expression, loc1 points to the first character of the string parameter and loc2 points to the null character at the end of the expression specified by the string parameter.
The step() function uses the integer variable circf, which is set by the compile() function when the regular expression begins with a ^ (circumflex). When this variable is set, the step() function only tries to match the regular expression to the beginning of the string. When you compile more than one regular expression before executing the first one, save the value of circf for each compiled expression and set circf to the saved value before each call to step().
The advance() function tests whether an initial substring of the string parameter matches the expression pointed to by the expbuf parameter. Using the same parameters that were passed to it, the step() function calls the advance() function. The step() function increments a pointer through the string parameter characters and calls advance() until a nonzero value, which indicates a match, is returned, or until the end of the expression pointed to by the string parameter is reached. To unconditionally constrain string to point to the beginning of the expression, call the advance() function directly instead of calling step().
When the advance() function encounters an ∗ (asterisk) or a \{\} sequence in the regular expression, it advances its pointer to the string to be matched as far as possible and recursively calls itself, trying to match the remainder of the regular expression. As long as there is no match, the advance() function backs up along the string until it finds a match or reaches the point in the string where the initial match with the ∗ or \{\} character occurred.
It is sometimes desirable to stop this backing-up before the initial pointer position in the string is reached. When the locs global character pointer is matched with the character at the pointer position in the string during the backing-up process, the advance() function breaks out of the recursive loop that backs up and returns the value 0 (zero).
EXAMPLE
The following is an example of the regular expression macros and calls from the grep command:
#define INITregister char ∗sp=instring;
#define GETC()(∗sp++)
#define PEEKC()(∗sp)
#define UNGETC(c)(--sp)
#define RETURN(c)return;
#define ERROR(c)regerr()
#include <regexp.h>
. . .
compile (patstr, expbuf, &expbuf[ESIZE], ’ ’);
. . .
if (step (linebuf, expbuf))
succeed( );
. . .
NOTES
Two versions of these functions are available. The first, for XPG3 applications, supports simple internationalized expressions. The second, for System V applications, supports simple (non-internationalized) regular expressions.
BSD applications use different functions for regular expression handling. See the re_comp() and re_exec() functions.
AES Support Level:
Trial use
RETURN VALUES
Upon successful completion, the compile() function calls the RETURN() macro. Upon failure, this function calls the ERROR() macro.
Whenever a successful match occurs, the step() and advance() functions return a nonzero value. Upon failure, these functions return 0 (zero).
ERRORS
If the compile() function fails, the ERROR() macro is called with an error number as its argument. The possible error numbers are:
ErrorMeaning
11Range endpoint too large
16Bad number
25\digit out of range
36Illegal or missing delimiter
41No remembered search string
42There is a \(\) pair imbalance
43Too many \(\) pairs (maximum is 9)
44More than two numbers given in the \{\} pair
45A } character expected after \
46First number exceeds second in the \{\} pair
49There is a [ ] pair imbalance
50Regular expression overflow
70Invalid endpoint in range expression
RELATED INFORMATION
Functions: ctype(3), re_comp(3)