grep(1) — Commands

OSF

NAME

grep, egrep, fgrep − Searches a file for a pattern

SYNOPSIS

grep [-bcilnqsvw] [-p paragraph_separator] pattern | -e pattern [file ... ]

egrep [-bchilnsv] pattern | -e pattern | -f pattern_file [file ... ]

fgrep [-bchilnsvx] -e pattern | -f pattern_file [file ... ]

The grep, egrep, and fgrep commands search the specified files (standard input by default) for lines containing characters that match the specified pattern, and then write matching lines to standard output.

FLAGS

While most flags can be combined, some combinations result in one flag overriding another. For example, if you specify -n and -l, the output includes filenames only (as specified by -l) and thus does not include line numbers (as specified by -n).

-bPrecedes each line by the block number on which it was found. Use this flag to help find disk block numbers by context.

-cDisplays only a count of matching lines.

-e pattern
Specifies a pattern. This works the same as a simple pattern, but is useful when the pattern begins with a - (dash).

-f pattern_file
Specifies a file that contains patterns (egrep and fgrep only).

-hSuppresses reporting of filenames when multiple files are processed (fgrep and egrep only).

-iIgnores the case of letters in locating pattern; that is, uppercase and lowercase in the input are considered to be identical. (For grep and fgrep, -y can be specified in place of -i; the effect is the same.)

-lLists the name of each file with lines matching pattern. Each filename is listed only once; filenames are separated by newline characters. grep returns standard innput (or the local equivalent) in place of a filename if -l is specified with standard input, but egrep and fgrep exit with a return value of 1 (see EXIT VALUES).

-nPrecedes each line with its relative line number in the file.

-p paragraph_separator
Displays the entire paragraph containing matched lines. Paragraphs are delimited by paragraph separators, paragraph_separator, which are patterns in the same form as the search pattern. Lines containing the paragraph separators are used only as separators; they are never included in the output. The default paragraph separator is a blank line (grep only).

-qSuppresses all output except error messages. This is useful for checking status. (grep only.)

-sSupresses error messages about inaccessible files (grep only). Suppresses all output except error messages. This is useful for checking status (egrep and fgrep).

-vDisplays all lines except those that match the specified pattern. Useful for filtering unwanted lines out of a file.

-wThe expression is searched for as a word (the pattern bracketed by nonalphanumeric characters or by the beginning or end of the line) (grep only). See ex.

-xDisplays lines that match the pattern exactly with no additional characters (fgrep only).

DESCRIPTION

Three versions of the grep command permit you to specify the matching pattern in varying levels of complexity:

grep

The grep command searches for patterns that are limited regular expressions as described under Regular Expressions. The grep command uses a compact, nondeterministic algorithm.

egrep

The egrep command searches for patterns that are full regular expressions, except for $ and $ and with the addition of the following rules:

•A regular expression followed by a + (plus sign) matches one or more occurrences of the regular expression.

•A regular expression followed by a ? (question mark) matches zero or one occurrence of the regular expression.

•Two regular expressions separated by a | (vertical bar) or by a newline character match either expression.

•A regular expression can be enclosed in ( ) (parentheses) for grouping.

The order of precedence of operators is [], then ∗, ?, and +, then concatenation, then | and the newline character.

The egrep command uses a deterministic algorithm that needs exponential space.

fgrep

The fgrep command searches for patterns that are fixed strings.

Command Usage

All versions of grep precede the matched line with the name of the file containing it if you specify more than one file (except when the -h flag is specified).

Lines are limited to 512 bytes; longer lines are broken into multiple lines of 512 or fewer bytes (grep only). Paragraphs (under the -p flag) are currently limited to a length of 5000 bytes.

Running grep on a non-text file (for example, an .o file) produces unpredictable results and is discouraged.

Regular Expressions (REs)

The following REs match a single character:

character
An ordinary character (one other than one of the special pattern-matching characters) matches itself.

.A . (dot) matches any single character except for the newline character.

[string]A string enclosed in [ ] (brackets) matches any one character in that string. In addition, certain pattern-matching characters have special meanings within brackets:

^If the first character of string is a ^ (circumflex), the RE [^string] matches any character except the characters in string and the newline character. A ^ has this special meaning only if it occurs first in the string.

−You can use a − (dash) to indicate a range of consecutive characters. The characters that fall within a range are determined by the current collating sequence, which is defined by the LC_COLLATE environment variable. For example, [a-d] is equivalent to [abcd] in the traditional ASCII collating sequence, but if you were using French collation rules, it would be equivalent to [aˆbcd]. A range can include a multicharacter collating element enclosed within bracket-period delimiters ([. .]). These collating symbols are necessary for languages that treat some strings as individual collating elements. For example, in Spanish, the strings ch and ll each are collating symbols (that is, the Spanish primary sort order is a, b, c, ch, d,...,k, l, ll, m,...). The bracket-period delimiters in the RE syntax distinguish multicharacter collating elements from a list of the individual characters that make up the element. When using Spanish collation rules, [[.ch.]] is treated as an RE matching the sequence ch, while [ch] is treated as an RE matching c or h. In addition, [a-[.ch.]] matches a, b, c, and ch. A collating sequence can define equivalence classes for characters. An equivalence class is a set of collating elements that all sort to the same primary location. They are enclosed within bracket-equal delimiters ([= =]). An equivalence class generally is designed to deal with primary-secondary sorting; that is, for languages like French that define groups of characters as sorting to the same primary location, and then having a tie-breaking, secondary sort. For example, if e, , and belong to the same equivalence class, then [[=e=]fg, [[==]fg], and [[==]fg] are each equivalent to [efg]. The - (dash) character loses its special meaning if it occurs first ([−string]), if it immediately follows an initial circumflex ([^−string]), or if it appears last ([string−]) in the string.

]When the ] (right bracket) is the first character in the string ([]string]) or when it immediately follows an initial circumflex ([^]string]), it is treated as a part of the string rather than as the string terminator.

\special_character
A \ (backslash) followed by a special pattern-matching character matches the special character itself (as a literal character). These special pattern-matching characters are as follows:

. ∗ [ \Always special, except when they appear within [ ] (brackets).

^Special at the beginning of an entire pattern or when it immediately follows the left bracket of a pair of brackets ([^...]).

$Special at the end of an entire pattern. In addition, the character used to delimit an entire pattern is special for that pattern. (For example, see how / (slash) is used in the g subcommand.)

[: :]A character class name enclosed in bracket-colon delimiters matches any of the set of characters in the named class. Members of each of the sets are determined by the current setting of the LC_CTYPE environment variable. The supported classes are: alpha, upper, lower, digit, alnum, xdigit, space, print, punct, graph, cntrl. Here is an example of how to specify one of these classes:

[[:lower:]]

This matches any lowercase character for the current locale.

Japanese Language Support

Equivalence classes in character ranges are not supported when Japanese Language Support is enabled. To avoid unpredictable results when using a range expression to match a class of characters, use a character class expression rather than a standard range expression.

Forming Patterns

The following rules describe how to form patterns from REs:

•An RE that consists of a single, ordinary character matches that same character in a string.

•An RE followed by an ∗ (asterisk) matches zero or more occurrences of the character that the RE matches. For example, the following pattern:

ab∗cd

matches each of the following strings:

acd
abcd
abbcd
abbbcd

but not the following string:

abd

If there is any choice, the longest matching leftmost string is chosen. For example, given the following string:

122333444

the pattern .∗ matches 122333444, the pattern .∗3 matches 122333, and the pattern .∗2 matches 122.

•An RE followed by:

\{number\}
Matches exactly number occurrences of the character matched by the RE.

\{number,\}
Matches at least number occurrences of the character matched by the RE.

\{number1,number2\}
Matches any number of occurrences of the character matched by the RE from number1 to number2, inclusive. The values of number1 and number2 must be integers from 0 to 255, inclusive. Whenever a choice exists, this pattern matches as many occurrences as possible. Note that if number is 0 (zero), pattern matches the beginning of the line.

•You can combine REs into patterns that match strings containing the same sequence of characters. For example, ABD matches the string AB∗CD and [A-Za-z]∗[0-9]∗ matches any string that contains any combination of ASCII alphabetic characters (including none), followed by any combination of numerals (including none).

•The character sequence $pattern$ matches pattern and saves it into a numbered holding space. Using this sequence, up to nine patterns can be saved on a line. Counting from left to right on the line, the first pattern saved is placed in the first holding space, the second pattern is placed in the second holding space, and so on. The character sequence \n matches the nth saved pattern, which is placed in the nth holding space. (The value of n is a digit, 1-9.) Thus, the following pattern:

$A$$B$C\2\1

matches the string ABCBA. You can nest patterns to be saved in holding spaces. Whether the enclosed patterns are nested or in a series, \n refers to the nth occurrence, counting from the left, of the delimiting characters, \). You can also use \n expressions in replacement strings as well as address patterns.

Restricting What Patterns Match

A pattern can be restricted to match from the beginning of a line, up to the end of the line, or the entire line:

•A ^ (circumflex) at the beginning of a pattern causes the pattern to match only a string that begins in the first character position on a line.

•A $ (dollar sign) at the end of a pattern causes that pattern to match only if the last matched character is the last character (not including the newline character) on a line.

•The construction ^pattern$ restricts the pattern to matching only an entire line.

In addition, the null pattern (that is, //) duplicates the previous pattern.

EXAMPLES

1.To search several files for a string of characters, enter:

fgrep ’strcpy’ ∗.c

This searches for the string strcpy in all files in the current directory with names ending in .c.

2.To count the number of lines that match a pattern, enter:

fgrep -c ’{’ pgm.c
fgrep -c ’}’ pgm.c

This displays the number of lines in pgm.c that contain left and right braces. If you do not put more than one { or } on a line in your C programs, and if the braces are properly balanced, then the two numbers displayed will be the same. If the numbers are not the same, then you can display the lines that contain braces in the order that they occur in the file with the command:

egrep -n ’{⏐}’ pgm.c

3.To display all lines in a file that begin with an ASCII letter, enter:

grep ’^[a-zA-Z]’ pgm.s

Note that because fgrep searches only for fixed strings and does not interpret pattern-matching characters, the following command causes fgrep to search only for the literal string ^[a-zA-Z] in pgm.s:

fgrep ’^[a-zA-Z]’ pgm.s

4.To display all lines that contain ASCII letters in parentheses or digits in parentheses (with spaces optionally preceding and following the letters or digits), but not letter-digit combinations in parentheses, enter:

egrep \
’$ ∗([a-zA-Z]∗|[0-9]∗) ∗$’ my.txt

This command displays lines in my.txt such as (y) or ( 783902), but not (alpha19c). Note that with egrep, $ and $ match parentheses in the text and ( and ) are special characters that group parts of the pattern. With grep, the reverse is true; use ( and ) to match parentheses and $ and $ to group characters.

5.To display all lines that do not match a pattern, enter:

grep -v ’^#’

This displays all lines that do not begin with a # (number sign).

6.To display the names of files that contain a pattern, enter:

fgrep -l ’rose’ ∗.list

This searches the files in the current directory that end with .list and displays the names of those files that contain at least one line containing the string rose.

7.To display all lines that contain uppercase characters, enter:

grep [[:upper:]] pgm.s

8.To display all lines that begin with a range of characters that includes a multicharacter collating symbol, enter:

grep ’^[a-[.ch.]]’ pgm.s

With your locale set to a Spanish locale, this command matches all lines that begin with a, b, c, or ch.

EXIT VALUES

The exit values of the grep, egrep, and fgrep commands are as follows:

0A match was found.

1No match was found.

2A syntax error was found or a file was inaccessible, even if matches were found.

RELATED INFORMATION

Commands: ed(1)/red(1), ex(1), sed(1), sh(1).

OSF/1 User’s Guide.

Museum

Related Articles