REGCMP(3X) — HP-UX
NAME
regcmp, regex − compile and execute regular expression
SYNOPSIS
char ∗regcmp (string1 [, string2, ...], (char ∗)0)
char ∗string1, ∗string2, ...;
char ∗regex (re, subject[, ret0, ...])
char ∗re, ∗subject, ∗ret0, ...;
extern char ∗__loc1;
DESCRIPTION
Regcmp compiles a regular expression and returns a pointer to the compiled form. Malloc(3C) is used to create space for the vector. It is the user’s responsibility to free unneeded space so allocated. A NULL return from regcmp indicates an incorrect argument.
Regex executes a compiled pattern against the subject string. Additional arguments are passed to receive values back. Regex returns NULL on failure or a pointer to the next unmatched character on success. A global character pointer __loc1 points to where the match began. Regcmp and regex were largely borrowed from the editor, ed(1); however, the syntax and semantics have been changed slightly. The following are the valid symbols and their associated meanings:
[]*.^ These symbols retain their current meaning.
$ Matches the end of the string; \n matches a new-line.
− Used within brackets the hyphen signifies a character range. For example, [a−z] is equivalent to [abcd...xyz]. The − can represent itself only if used as the first or last character. For example, the character class expression []−] matches the characters ] and −.
+ A regular expression followed by + means one or more times. For example, [0−9]+ is equivalent to [0−9][0−9]∗.
{m} {m,} {m,u}
Integer values enclosed in {} indicate the number of times the preceding regular expression can be applied. The value m is the minimum number and u is a maximum number, which must be no greater than 256. The syntax {m} indicates the exact number of times the regular expression can be applied. The syntax {m,} is analogous to {m,infinity}. The plus (+) and star (∗) operations are equivalent to {1,} and {0,} respectively.
( ... )$n The value of the enclosed regular expression is returned. The value is stored in the (n+1)th argument following the subject argument. A maximum of ten enclosed regular expressions are allowed. Regex makes its assignments unconditionally.
( ... ) Parentheses are used for grouping. An operator, such as ∗, +, or {}, can work on a single character or a regular expression enclosed in parentheses. For example, (a∗(cb+)∗)$0.
Since all of the above defined symbols are special characters, they must be escaped to be used as themselves.
This routine is kept in /lib/libPW.a.
EXAMPLES
Example 1:
char ∗cursor, ∗newcursor, ∗ptr;
...
newcursor = regex((ptr = regcmp("^\n", 0)), cursor);
free(ptr);
This example matches a leading new-line in the subject string to which the cursor points.
Example 2:
char ret0[9];
char ∗newcursor, ∗name;
...
name = regcmp("([A−Za−z][A−za−z0−9_]{0,7})$0", 0);
newcursor = regex(name, "123Testing321", ret0);
This example matches through the string “Testing3” and returns the address of the character after the last matched character (cursor+11). The string “Testing3” will be copied to the character array ret0.
WARNINGS
The user program might run out of memory if regcmp is called iteratively without freeing the vectors that are no longer required.
SEE ALSO
Hewlett-Packard Company — May 11, 2021