regcomp(3) — Subroutines
NAME
regcomp, regerror, regexec, regfree − Compares string to regular expression
LIBRARY
Standard C Library (libc. a)
SYNOPSIS
#include <sys/types.h>
#include <regex.h>
int regcomp(
regex_t ∗preg,
const char ∗pattern,
int cflags);
size_t regerror(
int errcode,
const regex_t ∗preg,
char ∗errbuf,
size_t errbuf_size);
int regexec(
const regex_t ∗preg,
const char ∗string,
size_t nmatch,
regmatch_t ∗pmatch,
int eflags);
void regfree(
regex_t ∗preg);
PARAMETERS
cflagsContains the bitwise inclusive OR of flags for regcomp().
The cflags parameter is the bitwise inclusive OR of zero or more of the following flags. These flags are defined in the /usr/include/regex.h file.
REG_EXTENDED
Uses extended regular expressions.
REG_ICASE
Ignores case in match.
REG_NOSUB
If this flag is not set, the regcomp() function sets the preg.re_nsub field to the number of parenthetic expressions found in the pattern parameter. Otherwise, a subexpression will result in an error.
REG_NEWLINE
Treats newline as a special character marking the end and beginning of lines.
patternContains the basic or extended regular expression to be compiled by regcomp().
The default regular expression type for the pattern parameter is a basic regular expression. An application can specify extended regular expressions with the REG_EXTENDED flag. If the REG_NOSUB flag is not set in the cflags parameter, the regcomp() function sets the number of parenthetic subexpressions (delimited by a \( (backslash left parenthesis) and \) (backslash right parenthesis) pair in basic regular expressions, or ( ) (parentheses) in extended regular expressions) to the number found in the pattern parameter.
pregThe structure that contains the compiled basic or extended regular expression.
errcodeIdentifies the error code.
errbufPoints to the buffer where regerror() stores the message text.
errbuf_sizeSpecifies the size of the errbuf buffer.
stringContains the data to be matched.
nmatchContains the number of subexpressions to match.
pmatchContains the array of offsets into the string parameter that match the corresponding subexpression in the preg parameter.
eflagsContains the bitwise inclusive OR of zero or more of the flags controlling the customizable behavior of the regexec function.
The eflags parameter modifies the interpretation of the contents of the string parameter. The value for this parameter is formed by bitwise inclusive ORing zero or more of the following flags, which are defined in the /usr/include/regex.h file.
REG_NOTBOL
The first character of the string pointed to by the string parameter is not the beginning of the line. Therefore, the circumflex character ^ (circumflex), when taken as a special character, does not match the beginning of the string parameter.
REG_NOTEOL
The last character of the string pointed to by the string parameter is not the end of the line. Therefore, the $ (dollar sign), when taken as a special character, does not match the end of the string parameter.
DESCRIPTION
The regcomp(), regerror(), regexec(), and regfree() functions perform regular expression matching. The regcomp() function compiles a regular expression and the regexec() function compares the compiled regular expression to a string. The regerror() function returns text associated with an error condition encountered by regcomp() or regexec(). The regfree() function frees the internal storage allocated for the compiled regular expression.
The regcomp() function compiles the basic or extended regular expression specified by the pattern parameter and places the output in the preg structure.
The regexec() function compares the NULL terminated string in the string parameter against the compiled basic or extended regular expression in the preg parameter. If a match is found, the regexec() function returns a value of 0 (zero). The regexec() function returns a nonzero value if there is no match or if there is an error.
If the value of the nmatch parameter is 0 (zero), or if the REG_NOSUB flag was set on the call to the regcomp() function, the regexec() function ignores the pmatch parameter. Otherwise, the pmatch parameter points to an array of at least the number of elements specified by the nmatch parameter. The regexec() function fills in the elements of the array pointed to by the pmatch parameter with offsets of the substrings of the string parameter. The elements of the pmatch array correspond to the parenthetic subexpressions of the original pattern parameter that was specified to the regcomp() function. The pmatch[i].rm_so structure is the byte offset of the beginning of the substring, and the pmatch[i].rm_eo structure is one greater than the byte offset of the end of the substring. Subexpression i begins at the ith matched open parenthesis, counting from 1. The 0 (zero) element of the array corresponds to the entire pattern. Unused elements of the pmatch parameter, up to the value pmatch[nmatch-1], are filled with -1. If there are more than the number of subexpressions specified by the nmatch parameter (the pattern parameter itself counts as a subexpression), only the first nmatch-1 are recorded.
When matching a basic or extended regular expression, any given parenthetic subexpression of the pattern parameter can participate in the match of several different substrings of the string parameter; however, it may not match any substring even though the pattern as a whole did match. The following rules are used to determine which substrings to report in the pmatch parameter when matching regular expressions:
•If a subexpression in a regular expression participated in the match several times, the offset of the last matching substring is reported in the pmatch parameter.
•If a subexpression did not participate in a match, then the byte offset in the pmatch parameter is a value of -1.
•If a subexpression is contained in a subexpression, the data in the pmatch parameter refers to the last such subexpression.
•If a subexpression is contained in a subexpression and the byte offsets in the pmatch parameter have a value of -1, the pointers in the pmatch parameter also have a value of -1.
•If a subexpression matched a zero-length string, the offsets in the pmatch parameter refer to the byte immediately following the matching string.
If the REG_NOSUB flag was set in the cflags parameter in the call to the regcomp() function, and the nmatch parameter is not equal to 0 (zero) in the call to the regexec function, the content of the pmatch array is unspecified.
If the REG_NEWLINE flag was not set in the cflags parameter when the regcomp() function was called, then a newline character in the pattern or string parameter is treated as an ordinary character. If the REG_NEWLINE flag was set when the regcomp() function was called, the newline character is treated as an ordinary character, except as follows:
•A newline character in the string parameter is not matched by a . (dot) outside of a bracket expression or by any form of a nonmatching list.
•A ^ (circumflex) in the pattern parameter, when used to specify expression anchoring, matches the zero-length string immediately after a newline character in the string parameter, regardless of the setting of the REG_NOTBOL flag.
•A $ (dollar sign) in the pattern parameter, when used to specify expression anchoring, matches the zero-length string immediately before a newline character in the string parameter, regardless of the setting of the REG_NOTEOL flag.
The regerror() function returns the text associated with the specified error code. If the regcomp() or regexec() function fails, it returns a nonzero error code. If this return value is assigned to the errcode parameter, the regerror() function returns the text of the associated message.
The regfree() function frees any memory allocated by the regcomp() function associated with the preg parameter. An expression defined by the preg parameter is no longer treated as a compiled basic or extended regular expression after it is given to the regfree() function.
EXAMPLES
1.The following example demonstrates how the REG_NOTBOL flag can be used with the regexec() function to find all substrings in a line that match a pattern supplied by a user. The main() function in the example accepts two input strings from the user. The match() function in the example uses regcomp() and regexec() to search for matches.
#include <sys/types.h>
#include <regex.h>
#include <locale.h>
#include <stdio.h>
#include <string.h>
#define SLENGTH 80
main()
{
char patt[SLENGTH], strng[SLENGTH];
char ∗eol;
(void)setlocale(LC_ALL, "");
printf("Enter a regular expression:");
fgets(patt, SLENGTH, stdin);
if ((eol = strchr(patt, ’\n’)) != NULL)
∗eol = ’\0’; /∗ Replace newline with null ∗/
else
return; /∗ Line entered too long ∗/
printf("Enter string to compare\nString: ");
fgets(strng, SLENGTH, stdin);
if ((eol = strchr(strng, ’\n’)) != NULL)
∗eol = ’\0’; /∗ Replace newline with null ∗/
else
return; /∗ Line entered too long ∗/
match(patt, strng);
}
int match(char ∗pattern, char ∗string)
{
char message[SLENGTH];
char ∗start_search;
int error, msize, count;
regex_t preg;
regmatch_t pmatch;
error = regcomp(&preg, pattern,
REG_ICASE | REG_EXTENDED);
if (error) {
msize = regerror(error, &preg, message, SLENGTH);
printf("%s\n", message);
if (msize > SLENGTH)
printf("Additional text lost\n");
return;
}
error = regexec(&preg, string, 1, &pmatch, 0);
if (error == REG_NOMATCH) {
printf("No matches in string\n");
return;
} else if (error != 0) {
msize = regerror(error, &preg, message, SLENGTH);
printf("%s\n", message);
if (msize > SLENGTH)
printf("Additional text lost\n");
return;
};
count = 1;
start_search = string + pmatch.rm_eo;
while (error == 0) {
error =
regexec(&preg, start_search, 1, &pmatch,
REG_NOTBOL);
start_search = start_search + pmatch.rm_eo;
count++;
};
count--;
printf("There are %i matches\n", count);
regfree(&preg);
}
2.The following example finds out which subexpressions in the regular expression have matches in the string. This example uses the same main() program as the preceding example. This example does not specify REG_EXTENDED in the call to regcomp() and, consequently, uses basic regular expressions, not extended regular expressions.
#define MAX_MATCH 10
int match(char ∗pattern, char ∗string)
{
char message[SLENGTH];
char ∗start_search;
int error, msize, count, matches_tocheck;
regex_t preg;
regmatch_t pmatch[MAX_MATCH];
error = regcomp(&preg, pattern, REG_ICASE);
if (error) {
msize = regerror(error, &preg, message, SLENGTH);
printf("regcomp: %s\n", message);
if (msize > SLENGTH)
printf("Additional text lost\n");
return;
}
if (preg.re_nsub > MAX_MATCH) {
printf("There are %i subexpressions, checking %i\n",
preg.re_nsub, MAX_MATCH);
matches_tocheck = MAX_MATCH;
} else {
printf(
"There are %i subexpressions in re\n", preg.re_nsub);
matches_tocheck = preg.re_nsub;
}
error = regexec(&preg, string, MAX_MATCH, &pmatch[0], 0);
if (error == REG_NOMATCH) {
printf("String did not contain match for entire re\n");
return;
} else if (error != 0) {
msize = regerror(error, &preg, message, SLENGTH);
printf("regexe: %s\n", message);
if (msize > SLENGTH)
printf("Additional text lost\n");
return;
} else
printf("String contained a match for the entire re\n");
for (count = 0; count <= matches_tocheck; count++) {
if (pmatch[count].rm_so != -1) {
printf(
"Subexpression %i matched in string\n", count);
printf(
"Match starts at %i. Byte after match is %i\n",
pmatch[count].rm_so, pmatch[count].rm_eo);
} else
printf(
"Subexpression %i had NO match\n", count);
}
regfree(&preg);
}
RETURN VALUES
Upon successful completion, the regcomp() function returns a value of 0 (zero). Otherwise, regcomp() returns one the following nonzero values indicating the type of failure. If the regcomp() function fails, the contents of the preg parameter is undefined. If the regexec() function finds a match, the function returns a value of 0 (zero). If the regexec() function does not find a match or fails for another reason, the function returns one of the following nonzero values indicating the type of failure. If the regexec() function does not find a match or fails for another reason, the contents of the pmatch parameter is undefined.
REG_BADBR
The contents within the pair \{ (backslash left brace) and \} (backslash right brace) are unusable: Not a number, number too large, more than two numbers, or first number larger than second.
REG_BADPAT
There is an unusable basic regular expression or extended regular expression.
REG_BADRPT
The ?, ∗, or + symbols are not preceded by a valid basic regular expression or an extended regular expression.
REG_EBRACE
The use of a pair of \{ (backslash left brace) and \} (backslash right brace) or {} (braces) is unbalanced.
REG_EBRACK
The use of [] (square brackets) is unbalanced.
REG_ECOLLATE
There is an unusable collating element referenced.
REG_ECTYPE
There is an unusable character class type referenced.
REG_EESCAPE
There is a trailing \ (backslash) in the pattern.
REG_ENOSYS
Function is unsupported.
REG_EPAREN
The use of a pair of \( (backslash left parenthesis) and \) (backslash right parenthesis) or () (parentheses) is unbalanced.
REG_ERANGE
There was an unusable endpoint in the range expression.
REG_ESPACE
There is insufficient memory space.
REG_ESUBREG
The number in \digit is unusable or in error.
REG_NOMATCH
The regexec() function failed to match.
If the regcomp() function detects an illegal basic or extended regular expression, it may return REG_BADPAT, or it may return an error code that more precisely describes the error.
The regerror() function returns the number of bytes required to store the message. This value may be greater than the value of the errbuf_size parameter.
The regfree function does not return a value.
ERRORS
These functions do not set errno to indicate an error.
RELATED INFORMATION
Commands: grep(1).