GREP is a filter that searches input files, or the standard input, for lines that contain matches for one or more patterns called regular expressions and displays those matching lines.
GREP combines most features of UNIX grep and fgrep. GREP has many other advantages over FIND besides using regular expressions:
The two versions operate the same and have the same features,
except that the 32-bit version supports long
filenames. If you typically run GREP in a DOS box under Windows 9x
or NT, the 32-bit version is the one you want.
You may wish to rename the version you use more often to the
simpler GREP.EXE. All the examples in this user guide will assume you've
done that. Otherwise, just substitute GREP16
or
GREP32
wherever you see GREP
in the examples.
grep /? | moreThe full command form is either of
grep [options] ["regexp"] [<inputfile] [>outputfile] grep [options] ["regexp"] inputfiles [>outputfile]In the first form, GREP is a filter, taking its input from the standard input (most likely piped from some other command). In the second form, GREP takes its input from any number of input files, possibly specified with paths and wildcards.
In both forms, the optional outputfile will receive the matching lines
(or other output, depending on the output
options). For output to the screen, omit >
and
outputfile.
"regexp" is a regular expression; see
below for how to construct one. A regular expression is normally
required on the command line; however, if you use
the /F
option,
regular expressions will be taken from a file or the keyboard instead
of the command line.
The command-line options, and the
values returned through ERRORLEVEL
,
are explained below.
Example:
grep /I "pic[t\s]" \proj\*.cob >prnwill examine every COBOL source file in the PROJ directory and print every line that contains a picture clause ("pic" followed by either "t" or a space) in caps or lower case (the
/I
option).
grep /I /S "pic[t\s]" \*.cob >prnwill examine every COBOL source file in all directories on the current disk (the
/S
option).
GREP can read text files just fine, whether lines are separated by the DOS-style carriage return plus line feed or the UNIX-style line feed only. See below for binary files.
grep "regexp" ..\*.c *.h d:\dir1\dir2\orich?.htmThe separator between directories in a path can be a backslash "\" or forward slash "/".
If input file names or paths contains spaces, you must enclose them in double quotes. This is a DOS restriction, not a feature only of GREP. For instance,
grep "regexp" c:\Program Files\My Office\*contains three file specs, namely c:\Program, Files\My, and Office\*. That's probably not what you meant. Double quotes preserve your intended meaning:
grep "regexp" "c:\Program Files\My Office\*"
/R
option
for details.
If a text file contains null characters (ASCII 0) or Control-Z
characters (ASCII 26), GREP cannot process it correctly in text mode
and you must use the /R
option to invoke binary mode.
When reading a very long input text line, GREP processes it in chunks.
Please see the description of the /W
option for details.
abc*
to include all
files, with any extension or none, whose names start with abc; with
the 16-bit version you need abc*.*
to get the same
result. This matches what you get with a DIR command in
16-bit or 32-bit DOS.
/A
option.
dirname\*
or dirname\*.*
,
depending on your operating system. Like other warnings, this one will
never appear if the /Q
option
is in effect.
/S
option, GREP will search not only
the files indicated on the command line, but also the files in
subdirectories.
For example, with the command
grep /S "regexp" \hazax*.* *.c g:\mumble\*.htmGREP will examine all files on the entire current drive whose names start with
hazax
; then it will look at all C source files
in the current directory and all subdirectories under it; finally it
will look at all HTML files in directory g:\mumble
and
all subdirectories under it.
Perhaps a more realistic example is this: you have a document about Vandelay Industries somewhere on your disk, but you can't remember where. This command should find it:
grep Vandelay /S \*.*(or
\*
with 32-bit GREP). You may also want to use the
/I
option if you can't remember whether
"Vandelay" was capitalized normally.
Subdirectory search follows the normal file-searching rules: hidden
and system subdirectories are normally ignored. (Yes, you have them if
you have Windows 9x.)
The /A
option
also applies during subdirectory search:
with /S
and /A
together, GREP will search every subdirectory.
There's no way to search every subdirectory but only normal files, or
to search only normal subdirectories but to search for hidden files in
them.
You may want to know in what order GREP examines files when the
/S
option is set. Ordinarily, GREP examines all files in
the first file argument, including the subdirectory tree, then
proceeds to the second file argument, and so on. However, when you use
the /S
option and none of the file arguments contains a
path, GREP will look first for all those files in the current
directory, then for all of them in the first subdirectory, and so on.
If you give GREP a filename argument that doesn't exist, it will
normally tell you, unless you used the
/Q
option.
However, when you specify /S
(search subdirectories),
GREP can't give such a warning because the specified file may exist in
some subdirectory.
The /D
option
will show you every directory and wildcard search as GREP performs it.
The output also contains lots of other stuff, but the file visits all
contain the string "GX:".
Here are quick hyperlinks to all the options:
You have a lot of freedom about how you enter options. You can use
a leading hyphen or slash mark; you can use upper- or lower-case
letters; you can leave spaces between options or combine them. For
instance, the following are just some of the different ways of turning
on the /P3
and
/B
options:
/p3 -b /b/P3 /p3B -B/P3 -P3 -bThis user guide will always use capital letters for the options, to make it easier to distinguish letter l and figure 1.
/?
more
or a similar filter, like this:
grep /? | more
You can also redirect this information. For instance,
grep /? >prnwill send the help text to the printer.
/0
or /1
/0
returns 0 if there are
differences or 1 if there are no differences; /1
returns
1 for differences or 0 for no differences. For more details, see
Return values below.
/A
The /A
option also modifies the action of the
/S
option (if present),
determining whether subdirectories marked hidden or system will be
searched.
/D
file
Since the debugging information can be voluminous, if you want to
see it at all you will usually want to
specify an output file: file must follow the D
with no intervening space, and the filename ends at the next space.
GREP will append to the file if it already exists. If you want to
display debugging information to the screen, omit the filename and
specify the
plain /D
option. Be careful not to specify any other options
between /D
and the next space, or they'll be taken as a
filename.
You can weed through the debugging output to some extent. GREP writes the following unique strings on most lines of output, so you can send debug output to a file and then grep the file for
grep GC:
parsing the command line
grep GM:
matching regular expression against input files
grep GR:
parsing and interpreting the regular expression
grep GX:
expanding directory and file specs, including subdirectories
/F
file
file must follow the /F
with no intervening
space, and the filename ends at the next space.
If you use a minus sign as the filename (/F-
option),
GREP will accept regular expressions from standard input. Don't do this if you
are redirecting input from a file with the < character!
/I
Caution: the /I
option does not apply to 8-bit
characters (characters 128-255). Because there are many different
encoding schemes, GREP doesn't know which characters above 127
correspond to each other as upper and lower case on your computer.
Therefore, if you want case-blind comparisons, you must explicitly
code any 8-bit upper and lower case in your regular expression. For
instance, to search for the French word "thé" in upper or lower case,
code it as th[éÉE]
since é can be upper-cased as É or as
plain E. The "th", being 7-bit ASCII
characters, will be found as upper or lower case by the /I
option.
(You may need to code 8-bit characters like éÉ
specially
if you enter them on the command line; see
Special rules for the command line below.)
/Q
/D
option).
/R
A text file has lines ending with carriage return (ASCII 13), line feed (ASCII 10), or both; and the first Control-Z (ASCII 26) marks the end of file. Also, a text file doesn't contain any NUL characters (ASCII 0). Binary files, on the other hand, may have NUL and Control-Z characters in the middle, and often don't have "lines" separated by anything.
DOS doesn't mark files as binary or text, and therefore GREP has no way to
know which a given file may be. By default it treats all files as
text, but if you specify the /R
option then GREP will
treat all files as binary. There's no way to treat some input files as
text and others as binary within a single GREP command.
When GREP reads files in binary mode, there's no such thing as a line,
so GREP reads files in blocks of characters. The block size is given
by the /W
option. Since there is
no such thing as a line, the ^
and $
characters (start and end of line) in a regular
expression are treated as normal characters.
Only the input files are read in binary mode. Regardless of the
/R
option, when you use the
/F
option to read the regular
expressions from a file, that file is read in normal text mode.
Also, if you don't specify any input files, GREP always scans the standard input
(possibly piped with |
or redirected
with <
) in text mode.
/S
/V
/W
width
Text mode (no /R
option)
The CR/LF (ASCII 13 or 10 or both) at the end of line don't count against the specified width. If GREP reads a long line from the input, it will break it after width+1 characters and treat the remainder as a separate line. The whole line gets scanned, but any match that starts before the break and ends after the break will be missed. Therefore, if possible you should set width large enough to hold the longest line in the file.
If GREP does find any lines longer than the specified or
default width, it will display a warning message at the end of
execution, telling you the length of the longest line.
(This warning is
suppressed by the /Q
option.)
In text mode, GREP will ignore anything on a line after the first null (ASCII 0), and it will ignore the rest of the file after a Control-Z (ASCII 26). Any files that contain these characters must be scanned in binary mode for accurate results.
Binary mode (/R
option
also specified)
Since binary files are usually not line
oriented, depending on the width it is possible that a match
might start in one block and end in the next, and thus be missed by
GREP. One sure cure, if you have enough memory, is to
specify a width at least as great as the file size. Failing
that, you can minimize the problem by using a width that is
large compared to the length of your regular expression, or by
scanning twice with two different widths.
/Z
If you use the /Z
option on the command line, any
options in the environment variable will be
disregarded, and so will any preceding options on the command line.
This can be useful in batch files, to make sure that the action of
GREP is controlled only by the options on the command line, and not by
any settings in the environment variable.
The /Z
option is the only single-letter option whose
effect can't be reversed. If you use /Z
more than once,
GREP disregards the environment variable and all command-line options
up through the last /Z
.
Before going through the options, let's take a moment to look at some of the possible output formats. By default, GREP's output is similar to that of DOS FIND:
---------- GREP.C op_showhead = ShowNoHeads; else if (op_showhead == ShowNoHeads) op_showhead = ShowNoHeads; ---------- GREP_MAT.C op_showhead == ShowNoHeads)However, the
/U
option
produces UNIX grep-style output like this:
GREP.C: op_showhead = ShowNoHeads; GREP.C: else if (op_showhead == ShowNoHeads) GREP.C: op_showhead = ShowNoHeads; GREP_MAT.C: op_showhead == ShowNoHeads)As you can see, the main difference is that DOS-style output has the filename as a header above the group of matching lines from that file, and UNIX-style output has the name of the file on every matching line.
Now, here are the options that control what GREP outputs and how it is formatted:
/B
/C
Lines are counted, not matches. If a match occurs several
times on a line, or several regular expressions match the same line,
the line is counted only once.
/H
grep /H "Directory" <inputfile | other program
/L
/V
option,
display the names of files that contain no matches. (This is
the same as the L option in UNIX grep.)
/N
/N
option looks like this:
---------- GREP.C [ 144] op_showhead = ShowNoHeads; [ 178] else if (op_showhead == ShowNoHeads) [ 366] op_showhead = ShowNoHeads; ---------- GREP_MAT.C [ 98] op_showhead == ShowNoHeads)With both
/N
and
the /U
option
together, the UNIX-style output looks like this:
GREP.C:144: op_showhead = ShowNoHeads; GREP.C:178: else if (op_showhead == ShowNoHeads) GREP.C:366: op_showhead = ShowNoHeads; GREP_MAT.C:98: op_showhead == ShowNoHeads)UNIX-style output is suitable for use with the excellent freeware editor Vim.
/P
before,after
Either number can be 0. For instance, use /P0,4
if you
want to show every match and the four lines that follow it.
If you use the /P
option, you probably want to use the
/N
option as well,
to display line numbers. In that case,
the punctuation of the line numbers will distinguish which lines are
actual matches and which are displayed for context. Here is some
DOS-style output from a run with the options /P1,1N
set:
---------- GREP.C 143 if (opcount >= argc) [ 144] op_showhead = ShowNoHeads; 145 177 PRTDBG "with each matching line"); [ 178] else if (op_showhead == ShowNoHeads) 179 PRTDBG "NO"); 365 if (myToggle('L') || myToggle('U') || myToggle('H')) [ 366] op_showhead = ShowNoHeads; 367 else if (myToggle('B')) ---------- GREP_MAT.C 97 op_showwhat == ShowMatchCount || [ 98] op_showhead == ShowNoHeads) 99 headered = TRUE;As you can see, the actual matches have square brackets around the line numbers, and the context lines do not.
The 16-bit GREP has to allocate space for the preview lines within the
same 64 K data segment as all other data. Consequently, if you
specify a moderately large value, particularly with a large line
width, you may get a message that GREP can't allocate space for the
lines. To resolve this, use the 32-bit version if possible; otherwise
either reduce the first number after /P
, or use the
/W
option to reduce the line
width. (The after number has no effect on memory use.)
/R
/R
option, described
earlier, makes GREP read files in binary
mode, and that has a side effect on the output format.
In normal text mode, any matching lines are displayed character for
character. If the line contains any non-printing characters,
like tab (ASCII 9) or Control-X (ASCII 24), they are treated
like any other character. But in binary mode, non-printing characters
are displayed using their numeric value in hex, such as <09> or
<18>.
/U
There's one small difference from UNIX grep output: UNIX grep
suppresses the filename when there is only one input file, but GREP
assumes that if you didn't want the filename you wouldn't have
specified the /U
option. Neither GREP and UNIX grep
displays a filename if input comes from a file via <
redirection.
Some combinations of output options are logically incompatible. For
instance, /H/L
makes no sense
(don't list filenames, and
list only filenames with matches). In such cases, GREP will
turn off one of the incompatible options and tell you what it did
(unless you suppress such messages with
the /Q
option).
The incompatibilities are just common sense, but are listed here for
completeness:
/B
| overrides /H ;
ignored with /L or /U
|
/C
| overrides /H , /L , /N , /P
|
/H
| ignored with /B , /C , /L , /U
|
/L
| overrides /B , /H , /N , /P , /U ;
ignored with /C
|
/N
| ignored with /C or /L
|
/P
| ignored with /C or /L
|
/U
| overrides /B and /H ;
ignored with /L
|
The following characters are special if they occur in the listed contexts:
\
), always
.
), asterisk (*
), plus sign
(+
), and left square bracket ([
), anywhere
except within square brackets
^
), only at the beginning of the regular
expression or immediately after a left square bracket
$
), only at the end of the regular
expression
-
), only between square
brackets
Here are the rules for a regular expression:
\
).
Example: to search for the string "^abc\def", you must put backslashes
before the two special characters to make GREP treat them as normal
characters and not give them special meanings, so that
\^abc\\def
is your regular expression.
You can use any character from space through character 255. If using 8-bit characters or certain special characters on the command line, see Special rules for the command line below.
If you specify the /I
option,
any letter A-Z or a-z that you specify will match both the capital and the
lower case of that letter. Other letters are not affected by the
/I
option.
[ ]
).
Examples: [aA]
will match an upper- or lower-case letter
A; sno[wr]ing
will match "snowing" or "snoring".
You can indicate a character range with the minus sign
(-
). Examples:
[0-9]
will match any single digit, and
[a-zA-Z]
will match any English letter.
To match any Western European letter (under most recent versions of
Windows, in North America and Western Europe), use
[a-zA-ZÀ-ÖØ-öø-ÿ]
.
(That regular expression will work fine on the command line with
16-bit GREP or in a file
[/F
option]
with either GREP. But to enter it on the command line with 32-bit
GREP, you must use numeric sequences for the 8-bit characters, for
example [a-zA-Z\192-\214\216-\246\248-\255]
. See
"Special rules for the command line"
below.)
A character class can contain both ranges and single characters, and
the order doesn't matter as long as each range within the class is written
low-high.
^
).
Examples: [^0-9 ]
matches any character except a
digit or a space; the[^a-z]
matches "the" followed by
anything except a lower-case letter.
Note: The negative character class matches any character not within
the square brackets, but it does match a character. For instance,
the[^a-z]
matches "the" followed by something other than a
lower-case letter; it does not match "the" at the end of a line
because then "the" is not followed by any characters. Please see the
extended example at the end of these rules for
further explanation.
+
) after a character or character
class matches one or more occurrences; an asterisk
(*
) matches zero or more occurrences.
Examples: snor+ing
matches "snoring", "snorring",
"snorrring", and so on, but not "snoing". snor*ing
matches "snoing", "snoring", and so on.
Used with a character class, the plus sign and asterisk match any
multiple characters in the class, not only multiple occurrences
of the same character. For instance, sno[rw]+ing
matches
"snowing", "snorwing", "snowrring", and so on.
Obligatory example: [A-Za-z_]+[A-Za-z0-9_]*
matches a C
or C++ identifier, which is at least one letter or underscore,
followed by any number of letters, digits, and underscores.
^
, ASCII 94) at the start of a regular
expression means that the pattern starts at the beginning of a line in
the file(s) being searched. A dollar sign ($
,
ASCII 36) at the end of a regular expression means that the pattern
ends at the end of a line in the file(s) being searched. If these
characters occur anywhere else, they are treated as normal characters.
Example: ^[wW]hereas
matches the word "Whereas" or
"whereas" at the start of a line, but not in the middle of a line.
Blanks are not ignored, so if you want to find that word whenever it's
the first word of the line, you need to use a pattern like
^ *[wW]hereas
to allow for indention.
Examples: ^$
will find lines that contain no characters at
all. ^ *$
will match lines that contain no
characters or contain only spaces. ^ +$
will match
lines that contain only spaces, but not empty lines.
Examples: ^[A-Za-z]+$
will find every line that contains
nothing but English letters. ^ *[A-Za-z]+ *$
will find every line that contains exactly one English word, possibly
preceded or followed by blanks.
These characters for start of line and end of line have no special
meaning in binary mode, which is
controlled by the /R
option. In
binary mode, the ^
and $
are treated as
normal characters.
/I
option
to make the search case blind, and concentrate
on constructing the regular expressions. At first glance,
[^a-z]the[^a-z]
seems adequate: anything other than a
letter, followed by "the", followed by anything but a letter. That
lets in "the" and rules out "then" and "mother". But it also rules
out "the" at the beginning or end of a line. Remember that a negative
character class does insist on matching some character. So the
solution is to have four regular expressions, for "the" at the
beginning, middle, or end of a line, or on a line by itself:
^the[^a-z] [^a-z]the[^a-z] [^a-z]the$ ^the$So to search for just the occurrences of the word "the", you'd put those four lines in a file and then use the
/F
option on GREP.
/F
option),
the above rules are sufficient. But when
you enter a regular expression on the command line, you also have to
contend with command-line parsing, which changes the meanings of some
characters before GREP ever sees them. Putting double quotes
around the expression may help, but it doesn't avoid all problems.
Please remember, the cautions and special rules in this section apply
only when you enter a regular expression on the
command line. Please ignore this section when using either form of
the /F
option,
which I recommend when your regular expression is at all complicated.
If your regular expression begins with a minus
(-
) or slash (/
), GREP will try to interpret
it as an option. Example: if you're searching for the string
"-in-law", GREP will think you're trying to turn on the
options /I
, /N
, and so on. To avoid this
problem, use a leading backslash (\-in-law
).
If your regular expression contains certain special characters like
<
, =
, and |
, DOS will give
those characters their special DOS meaning and GREP will never see
them.
So you must use special "escape sequences" to represent those
characters in a regular expression on the command line:
instead of | you can use any of |
---|---|
< (less) | \l \60 \0x3C \074 |
> (greater) | \g \62 \0x3E \076 |
| (vertical bar) | \v \124 \0x7C \0174 |
" (double quote) | \" \34 \0x22 \042 |
, (comma) | \c \44 \0x2C \054 |
; (semicolon) | \i \59 \0x3B \073 |
= (equal) | \q \61 \0x3D \075 |
(space) | \s \32 \0x20 \040 |
(tab) | \t \9 \0x09 \011 |
(escape) | \e \27 \0x1B \033 |
You can enter any character as a numeric sequence, not just the
special characters in the above list. You can use
decimal, hex (leading 0x
), or
octal (leading zero). Example: capital A would be
\65
, \0x41
, or \0101
.
Finally, if your regular expression contains 8-bit characters,
Microsoft's 32-bit startup code (not DOS) will translate
these characters from a dos character set to a Windows character set,
which is probably not what you want. To avoid this problem, either
enter the regular expression in a file
(/F
file),
let GREP prompt you to enter it from the keyboard
(/F-
),
or use the numeric sequences to enter characters. Example: In a
regular expression on the command line, instead of
actually typing the character é
, enter it as \233
or
\0xE9
or \0351
.
Remember, the rules in this section are required only to get around
parsing problems on the command line. These escape sequences are not
needed, and don't work, in regular expressions in a file or when you
use the /F-
option
to enter regular expressions on separate lines from the keyboard.
ORS_GREP
environment variable. You have the same freedom
as on the command line: leading slashes or hyphens, space separation
or options run together, caps or lower case.
Only options can be put in the environment variable. If you want to
"can" a regular expression, put it in a file and put
/F
file
in the environment variable.
If you have some options in the environment variable but you don't want one of them for a particular run of GREP, you don't have to edit the environment variable. You can make most changes on the command line, like this:
/Z
option on the
command line makes GREP disregard the environment variable (as well as
any preceding options on the command line).
/N
in the environment variable. Then if
you don't want line numbers in a particular run of GREP, just specify
/N
on the command line for that run to cancel the
/N
option set in the environment variable.
/0
and /1
, which set
return values from GREP, override each other. The latest one specified
on the command line will be effective.
/D
option and
/F
option, if set in the
environment variable, cannot be turned off on the command line.
However, you can specify different files on the command line with
either of those options.
/P
and
/W
in the environment variable can be overridden by
different setting on the command line. You can use
/P0
to request no context lines.
If you want to disregard all options in the environment variable,
use the /Z
option on the command
line.
If you're ever in doubt about the interaction of options between the command line and the environment variable, simply type
grep /dand GREP will tell you all the option settings in effect.
IF ERRORLEVEL
in
a batch file.
255 | bad option, or other error on the command line or regular expression |
254 | specified file not available |
253 | insufficient memory: try reducing values
specified with the
/P option or
/W option, or use the 32-bit GREP
if possible
|
252 | internal error in expanding a regular expression |
2 | help message displayed (/?
option, or nothing specified on the command line)
|
0 | program ran to completion (whether or not there were any matches) |
You might want to use GREP in a batch file or a makefile and take
different actions depending on whether matches were found or not.
To do this, use the /0
or /1
option.
With the /1
option, GREP returns these values of
ERRORLEVEL
:
0 | no matches were found |
1 | one or more matches were found |
2-355 | as above |
/0
is
the opposite: it returns these ERRORLEVEL
values:
0 | one or more matches were found |
1 | no matches were found |
2-355 | as above |
/0
or /1
option
lets you tell GREP which value to return if matches are found.
If an input line contains a NUL character (ASCII 0), GREP will ignore
any later characters on that line. A text file should never contain
a NUL character, but if it does you should be able to read it by using
the /R
option.
GREP's regular expressions are slightly
different from UNIX grep's. Specifically, to accommodate DOS
command-line parsing, GREP defines quite a few more
escape characters like \c
and
\s
, as well as numeric escapes. On the other hand, GREP
does not (yet) implement ?
, \<
,
\(
, \{
, and \|
in regular
expressions.
version 5.0, 2000-05-07 -- program changes:
/R
option to read and
display files in binary mode
/W
option to set the line width
(formerly fixed at 255 characters), and warn the user if longer lines
were found
/Z
option to reset all
options
z-a
; previously they were silently treated like the three
characters "z", "-", "a"
grep
with no options or regular expression; instead,
suggest using grep /? |more
d:*
, the drive was ignored
version 5.0, 2000-05-07 -- changes in user guide:
version 5.1, 2000-05-31: