Museum

Home

Lab Overview

Retrotechnology Articles

⇒ Online Manual

Media Vault

Software Library

Restoration Projects

Artifacts Sought



     REGEX(S)                  UNIX System V                  REGEX(S)



     Name
          regex, regcmp - compiles and executes regular expressions.

     Syntax
          char *regcmp(string1[,string2, ...],(char *)0);
          char *string1, *string2, ...;

          char *regex(re,subject[,ret0, ...]);
          char *re, *subject, *ret0, ...;
          extern char * __loc1;

     Description
          The regcmp routine compiles a regular expression and returns
          a pointer to the compiled form.  The malloc(S) routine is
          used to create space for the vector.  It is the user's
          responsibility to free unneeded space so allocated.  A zero
          return from regcmp indicates an incorrect argument.
          regcmp(CP) has been written to generally preclude the need
          for this routine at execution time.

          The regex routine executes a compiled pattern against the
          subject string.  Additional arguments are passed to receive
          values back.  regex returns zero on failure or a pointer to
          the next unmatched character on success.  A global character
          pointer __loc1 points to where the match began.  regcmp and
          regex were derived from the editor, ed(C): however, the
          syntax and semantics have been changed slightly.  The
          following are the valid symbols and their associated
          meanings.

          []*.^     These symbols retain their current meaning.

          $         Matches the end of the string.

          \n        matches the newline.

          -         Within brackets the minus means through.  For
                    example, [a-z] is equivalent to [abcd...xyz].  The
                    - can appear as itself only if used as the last or
                    first character.  For example, the character class
                    expression []-] matches the characters ] and -.

          +         A regular expression followed by + means ``one or
                    more times''.  For example, [0-9]+ is equivalent
                    to [0-9][0-9]*.

          {m} {m,} {m,u}
                    Integer values enclosed in {} indicate the number
                    of times the preceding regular expression is to be
                    applied.  m is the minimum number and u is a
                    number, less than 256, which is the maximum.  If
                    only m is present (e.g., {m}), it indicates the
                    exact number of times the regular expression is to
                    be applied.  {m,} is analogous to {m,infinity}.
                    The plus (+) and star (*) operations are
                    equivalent to {1,} and {0,} respectively.

          ( ... )$n The value of the enclosed regular expression is to
                    be returned.  The value will be stored in the
                    (n+1)th argument following the subject argument.
                    At present, at most ten enclosed regular
                    expressions are allowed.  regex makes its
                    assignments unconditionally.

          ( ... )   Parentheses are used for grouping.  An operator,
                    e.g.  *, +, {}, can work on a single character or
                    a regular expression enclosed in parenthesis.  For
                    example, (a*(cb+)*)$0.

          By necessity, all the above defined symbols are special.
          They must, therefore, be escaped to be used as themselves.

     See Also
          ed(C), regcmp(CP), free(S), malloc(S)

     Examples
          Example 1:

               char *cursor, *newcursor, *ptr;
                    ...
               newcursor = regex((ptr=regcmp("^\n",(char*)0)),cursor);
               free(ptr);

          This example will match a leading newline in the subject
          string pointed at by cursor.

          Example 2:

               char ret0[9];
               char *newcursor, *name;
                    ...
               name = regcmp("([A-Za-z][A-Za-z0-9]{0,7})$0",(char*)0);
               newcursor = regex(name,"123Testing321",ret0);

          This example will match through the string Testing3 and will
          return the address of the character after the last matched
          character (cursor+11).  The string Testing3 will be copied
          to the character array ret0.

          Example 3:

               #include "file.i"
               char *string, *newcursor;
                    ...
               newcursor = regex(name,string);

          This example applies a precompiled regular expression in
          file.i (see regcmp(CP)) against string.

          Example 4:

               char *ptr, *newcursor;

               ptr = regcmp("[a-[=i=][:digit:]]*",(char*)0);
               newcursor = regex(ptr, "123CHICO321");

          It is assumed in this example that the current locale's
          collation rules specify the following sequence -

               A, a, B, b, C, c, CH, Ch, ch, D, d, E, e, F, f, G, g,
               H, h, I, i.....

          The characters I and i are also both in the same ``primary''
          collation group.

          The following characters are all members of the digit ctype
          class -

               0, 1, 2, 3, 4, 5, 6, 7, 8, 9

          This example will match through the string ``123CHIC'' and
          return the address of the character ``O'' in the string.

     Notes
          The user program may run out of memory if regcmp is called
          iteratively without freeing the vectors that are no longer
          required.  The following user-supplied replacement for
          malloc(S) reuses the same vector saving time and space:

               /* user's program */
                    ...
               malloc(n)
               {
                    static int rebuf[256];
                    return &rebuf;
               }








                                             (printed 6/20/89)



Typewritten Software • bear@typewritten.org • Edmonds, WA 98026