This chapter describes how to search directories and files for keywords and strings using the SunOS command grep.
To search for a particular character string in a file, use the grep command. The basic syntax of the grep command is:
$ grep string file |
where string is the word or phrase you want to find, and file is the file to be searched.
A string is one or more characters; a single letter is a string, as is a word or a sentence. Strings may include "white space," punctuation, and invisible (control) characters.
For example, to find Edgar Allan Poe's telephone extension, type grep, all or part of his name, and the file containing the information:
$ grep Poe extensions Edgar Allan Poe x72836 $ |
Note that more than one line may match the pattern you give:
$ grep Allan extensions David Allan x76438 Edgar Allan Poe x72836 $ grep Al extensions Louisa May Alcott x74236 David Allan x76438 Edgar Allan Poe x72836 $ |
grep is case-sensitive; that is, you must match the pattern with respect to uppercase and lowercase letters:
$ grep allan extensions $ grep Allan extensions David Allan x76438 Edgar Allan Poe x72836 $ |
Note that grep failed in the first try because none of the entries began with a lowercase "a."
grep is very often used as a "filter" with other commands. It allows you to filter out useless information from the output of commands. To use grep as a filter, you must pipe the output of the command through grep. The symbol for pipe is "|".
The following example displays files ending in ".ps" that were created in the month of May:
$ ls -l *.ps | grep May |
The first part of this command line,
ls -l *.ps |
produces a list of files:
The second part,
| grep May |
pipes that list through grep, looking for the pattern May:
$ ls -l *.ps | grep May -rw-r--r-- 1 elvis 2356 May 22 12:56 clock.ps -rw-r--r-- 1 elvis 5644 May 22 15:07 buttons.ps $ |
To find a pattern that is more than one word long, enclose the string with single or double quotation marks:
$ grep "Louisa May" extensions Louisa May Alcott x74236 $ |
grep can search for a string in groups of files. When it finds a pattern that matches in more than one file, it prints the name of the file, followed by a colon, then the line matching the pattern:
$ grep ar * actors:Humphrey Bogart alaska:Alaska is the largest state in the United States. wilde:book. Books are well written or badly written. $ |
To search for all the lines of a file that don't contain a certain string, use the -voption to grep. The following example shows how to find all of the lines in the user medici's home directory files that don't contain the letter e:
$ ls actors alaska hinterland tutors wilde $ grep -v e * actors:Mon Mar 14 10:00 PST 1936 wilde:That is all. $ |
You can also use the grep command to search for targets defined as patterns using regular expressions. Regular expressions consist of letters and numbers, in addition to characters with special meaning to grep. These special characters, called metacharacters, also have special meaning to the system and need to be quoted or escaped. Whenever you use a grep regular expression at the command prompt, surround it with quotes, or escape metacharacters (such as & ! . * $ ? and \) with a backslash (\).
A caret (^) indicates the beginning of the line. So the command:
$ grep '^b' list |
finds any line in the file list starting with "b."
A dollar-sign ($) indicates the end of the line. The command:
$ grep 'b$' list |
displays any line in which "b" is the last character on the line. And the command:
$ grep '^b$' list |
displays any line in list where "b" is the only character on the line.
Within a regular expression, dot (.) finds any single character. So the command:
$ grep 'an.' list |
would match any three characters with "an" as the first two, including "any," "and," "management," and "plan" (because spaces count, too).
When an asterisk (*) follows a character, grep interprets it as "zero or more instances of that character." When the asterisk follows a regular expression, grep interprets it as "zero or more instances of characters matching the pattern."
Because it includes zero occurrences, usage of the asterisk is a little non-intuitive. Suppose you want to find all words with the letters "qu" in them. Typing:
$ grep 'qu*' list |
will work as expected. However, if you wanted to find all words containing the letter "n," you would have to type:
$ grep 'nn*' list |
If you wanted to find all words containing the pattern "nn," you would have to type:
$ grep 'nnn*' list |
You may want to try this to see what happens otherwise.
To match zero or more occurrences of any character in list, type:
$ grep .* list |
Suppose you want to find lines in the text that have a dollar sign ($) in them. Preceding the dollar sign in the regular expression with a backslash (\) tells grep to ignore (escape) its special meaning. This is true for the other metacharacters (& ! . * ? and \ itself) as well.
For example, the expression
$ grep ^\. |
matches lines starting with a period, and is especially useful when searching for nroff or troff formatting requests (which begin with a period).
The following table, Table 4-1, provides a list of the more commonly used search pattern elements you can use with grep.
Table 4-1 grep Search Pattern Elements
Character |
Matches |
---|---|
The beginning of a text line |
|
The end of a text line |
|
Any single character |
|
[...] |
Any single character in the bracketed list or range |
[^...] |
Any character not in the list or range |
Zero or more occurrences of the preceding character or regular expression |
|
.* |
Zero or more occurrences of any single character |
Escapes special meaning of next character |
Note that these search characters can also be used in vi text editor searches.
As shown earlier, you use quotation marks to surround text that you want to be interpreted as one word. For example, you would type the following to use grep to search all files for the phrase "dang it, boys":
$ grep "dang it, boys" * |
Single quotation marks (') can also be used to group multiword phrases into single units. Single quotation marks also make sure that certain characters, such as $, are interpreted literally. (The history metacharacter ! is always interpreted as such, even inside quotation marks, unless you escape it with a backslash.) In any case, it is a good idea to escape characters such as & ! $ ? . ; and \ when you want them taken as ordinary typographical characters.
For example, if you type:
$ grep $ list |
you will see all the lines in list. However, if you type:
$ grep '\$' list |
you will see only those lines with the "$" character in them.
For more information on the grep(1) command, refer to the man Pages(1): User Commands.