(This is the user guide in plain text. If you have any Web
browser, you may prefer to read CMPUSER.HTM, which has all the
same text but is better formatted and contains many helpful
hyperlinks.)



                         CMP -- Compare Text Files
                           User Guide and History

         program and documentation by Stan Brown, Oak Road Systems
                     release 4.3, revised 22 May 2000
           Copyright  1994-2000 by Stan Brown, Oak Road Systems
  ------------------------------------------------------------------------

CMP will compare files (or groups of files) and report any differences.
Output is suitable for piping, or processing by other programs. A value
returned in ERRORLEVEL lets batch files take action based on whether files
are the same or differ.

        Why CMP?
        License and warranty
        System requirements
        Installation
        User instructions
        Options
            General options
            Options affecting comparison
            Options affecting output
        Environment variable
        Return values
        How spaces and tabs are handled
        What's new?


  ------------------------------------------------------------------------

                                  Why CMP?

  ------------------------------------------------------------------------
CMP improves on the DOS utilities, COMP.COM and the newer FC.EXE, in
several respects:

   * CMP comes in both 16-bit and 32-bit versions. The 16-bit version runs
     in virtually any DOS, and the 32-bit version recognizes long
     filenames.
   * CMP returns status values that can be useful in batch files.
   * CMP can compare files whether or not spaces have been compressed to
     tabs (as is done by many editors), and will disregard differences due
     solely to different spacing within a line or to blank lines. (You can
     turn these features on and off with command-line options.)
   * CMP lets you configure the output format in various ways, optionally
     suppressing various levels of messages.
   * CMP lets you specify a line width for comparison, as well as a number
     of lines for# look-ahead. This lets you use available memory most
     efficiently.
   * CMP lets you store often-used options in an environment variable
     instead of typing them on the command line every time.

  ------------------------------------------------------------------------

                            License and warranty

  ------------------------------------------------------------------------
CMP is shareware. If you use it past a 30-day evaluation period, you are
morally and legally bound to register and pay for it. Please see the file
LICENSE.TXT for full details, including support and warranty information.

  ------------------------------------------------------------------------

                            System requirements

  ------------------------------------------------------------------------
The 16-bit version runs under DOS 2.0 or higher, including a DOS box under
Windows. The 32-bit version requires a DOS box under Windows 98, Win95, or
Win NT 4.0. (I fully expect it to run in Windows 2000, but have not tested
it.)

The two versions operate the same and have the same features, except that
the 32-bit version supports long filenames and more lines per file (about
2000 million versus about 32 thousand). If you typically run CMP in a DOS
box under Windows 9x or NT, the 32-bit version is the one you want.

  ------------------------------------------------------------------------

                                Installation

  ------------------------------------------------------------------------
There is no special installation procedure. Simply move CMP16.EXE,
CMP32.EXE, or both to any convenient directory in your path.

You may wish to rename the version you use more often to the simpler
CMP.EXE. All the examples in this user guide will assume you've done that.
Otherwise, just substitute CMP16 or CMP32 wherever you see CMP in the
examples.

  ------------------------------------------------------------------------

                             User instructions

  ------------------------------------------------------------------------
For a quick summary of operating instructions, type

        cmp

The full command form is one of

        cmp [options] file1 file2 [>reportfile]
        cmp [options] files directory [>reportfile]

In the second form, files may be any number of file specs, possibly
containing wildcards, and directory may be a disk letter (with colon) or
path (with or without trailing backslash). Please be aware that the 16-bit
and 32-bit CMP programs expand wildcards slightly differently because the
32-bit version supports long filenames. Thus the 32-bit version would
expand abc* to include all files, with any extension or none, whose names
start with abc; with the 16-bit version you need abc*.* to get the same
result.

Example:

        cmp -L5 zonk1 b:zonk2

will compare file zonk1 (on the current drive and directory) to file zonk2
in the current directory of drive b, limiting look-ahead to five lines (the
-L5 option).

Another example:

        cmp a:*.doc xx.htm b:

will compare all the *.doc files in the current directory of drive a, plus
xx.htm in the current directory of the current disk, to files of the same
names in the current directory of drive b.

  ------------------------------------------------------------------------

                                  Options

  ------------------------------------------------------------------------
CMP's operation can be modified by several options, either on the command
line or in an environment variable(see below).

Here are quick hyperlinks to all the options:
 /0   /1   /?   /B   /D   /E   /F   /I   /L   /N   /Q   /QQ   /T   /W   /Z

You have a lot of freedom about how you enter options. You can use a
leading hyphen or slash mark; you can use upper- or lower-case letters. You
can leave spaces between options or combine them. For instance, the
following are just some of the different ways of turning on the W100 and B
options:

        /w100 /b    /w100/b    /w100B    -W100-B    -W100 -b

This document will always use capital letters for the options, to make it
easier to distinguish letter l and figure 1.

General options

Quite a number of options control the comparison mechanism and the output
format. Those options are grouped in separate sections below. This section
explains the others.

/?
     Display a help message and option summary and exit with no further
     processing. You can redirect this information. For instance,

             cmp /? >prn

     will send the help text to the printer.

/0 and /1
     These options let you control the values that CMP returns in the DOS
     error level. /0 returns 0 if there are differences or 1 if there are
     no differences; /1 returns 1 for differences or 0 for no differences.
     For more details, see Return values below.

/D
     displays debugging information. Debugging information includes whether
     you're running the 16-bit or 32-bit version, the value of the
     environment variable, and the values of all options specified or
     implied. This information is normally suppressed, but you may find it
     helpful if CMP seems to behave in a way you don't expect.

/Z
     Reset all options to their default values.

     If you use the /Z option on the command line, any options in the
     environment variable will be disregarded, and so will any preceding
     options on the command line. This can be useful in batch files, to
     make sure that the action of GREP is controlled only by the options on
     the command line, and not by any settings in the environment variable.

     The /Z option is the only one whose effect can't be reversed. If you
     use /Z more than once, GREP disregards the environment variable and
     all command-line options up through the last /Z.

Options affecting comparison

This section explains the options that affect how CMP does its comparisons.

/B
     toggles between "compress any run of spaces and/or tabs into a single
     space for comparison purposes" and "don't compress whitespace within a
     line". The default is to compress, so that CMP normally considers
     "a    b" and "a b" and "a  {tab} b" identical.

     Note that runs of spaces and/or tabs are compressed to a single space,
     not completely removed. Thus CMP will always consider "ab" (with no
     space between "a" and "b") different from "a b" (any spaces or tabs
     between "a" and "b").

     Regardless of this option, CMP will always disregard spaces and tabs
     at the ends of lines. Some more esoteric details are given below in
     "How spaces and tabs are handled".

/E
     toggles between "ignore blank lines" (the default) and "treat blank
     lines like any others". Normally, CMP ignores any lines of length 0,
     and lines that contain only spaces and tabs. Specify the /E option to
     make CMP keep track of blank lines and report added or deleted blank
     lines as differences.

/I
     toggles between "ignore case" and "consider A-Z different from a-z".
     The default is to treat upper and lower case as different. Because of
     limitations in the MSC library, this option affects only the English
     letters A through Z. Non-English lower-case letters are always
     considered different from the corresponding upper-case letters.

/Llook-ahead
     sets the look-ahead to look-ahead lines from each file. The default is
     20 lines in the 16-bit version and 100 lines in the 32-bit version.

     The significance of look-ahead is this. Suppose CMP finds, after lines
     28-31 of file 1 match lines 38-41 of file 2, that line 32 of file 1
     doesn't match line 42 of file 2. In this case, CMP has to look ahead
     at line 33 of file 1 and line 43 of file 2.

                   file 1               file 2
             ==================   ==================
             (28) line a          (38) line a
             (29) line b          (39) line b
             (30) line c          (40) line c
             (31) line d          (41) line d
             (32) line e          (42) something different
             (33: look ahead)     (43: look ahead)

     Maybe they match, or maybe line 43 of file 2 matches line 32 of file 1
     (meaning that line 42 of file 2 is new in that file and doesn't exist
     in file 1). The /L option tells CMP how many lines to look ahead
     trying to find a match after lines that don't match. If CMP examines
     that number of lines from both files without finding a match, it will
     report that fact and stop processing. (If you wish, you can then
     re-run CMP with a higher /L value.)

     There's no specific limit for look-ahead by itself, but /L and /W
     (below) have a combined limit. In the 16-bit version, 64 K (65,536
     bytes) is available for look-ahead, and look-ahead times (width+2)
     must not exceed that value. In the 32-bit version, the look-ahead and
     width are limited only by available memory (including virtual memory).
     In either version, if you exceed the available space with the combined
     /L and /W options, CMP will display a message inviting you to choose
     lower values.

/T
     toggles between "during comparison, replace each tab with the number
     of spaces necessary to reach the next tab stop" and "treat a tab as an
     ordinary character". The default is to expand tabs, and the tab stops
     occur every 8 columns.

     The /T option has no effect unless you also use /B to turn off the
     compression of runs of spaces.

     Some more esoteric details are given below in "How spaces and tabs are
     handled".

/Wwidth
     sets the significant line width to width characters; the default is
     254. CMP will examine each line only up to this width, and will
     display an error message for any lines that exceed it. CMP will also
     tell you at the end that some lines were truncated, reporting the
     greatest line width in either file. That makes it easy for you to
     re-run CMP with a higher /W value if you suspect that some lines
     contain differences beyond the original width.

     You can suppress the messages about truncation of individual lines by
     using the /Q option, but CMP will still display the message at the end
     so you'll know that some lines were not examined completely and what
     you can do about it.

     The effective width of a line, which is measured against /Wwidth, may
     be different from that line's length in characters, depending on how
     spaces and tabs are handled (see below). If you want to know the
     actual maximum effective line width in a file, simply compare the file
     to itself with a small width value and the /Q option to suppress
     messages, like this:

             cmp /QW10 file1 file1

     The maximum value for /W depends on the value given for /L (above).

Options affecting output

This section explains the options that affect CMP's output. In addition to
those listed in this section, the /B /E /T options, above, will affect the
formatting of the lines reported as different.

/Fn
     Format line numbers in a field of n. This lets you ensure that
     reported differences all line up. (You might wonder why CMP doesn't
     just figure the necessary width on its own. To do that, CMP would have
     to read each file an extra time, just to count lines. That would slow
     the program down significantly.)

     n is a minimum field width. If you specify /F4, line numbers for any
     differences in lines 1 through 9999 will be right justified in a
     four-character field. Any larger line numbers will take additional
     positions to the right, like this:

             1.  99>text1a
             2.  99>text1b
             2. 100>text1c

             1.2398>text2a
             2.2399>text2b

             1.23468>text3a
             1.23469>text3b
             2.23469>text3c

     If you prefer to left justify line numbers in a field of stated width,
     put a minus sign before n. For instance, the output under the /F-4
     option would line up like the above, but spaces would appear after the
     short line numbers instead of before.

     The default is /F0, which displays each line number with no padding,
     like this:

             1.99>text1a
             2.99>text1b
             2.100>text1c

             1.2398>text2a
             2.2399>text2b

             1.23468>text3a
             1.23469>text3b
             2.23469>text3c

/Nstr
     Separate line numbers from lines by str instead of the default >
     character. You can specify a string of up to six characters; the
     string is terminated by the next space or tab. Don't use quotes with
     this option unless you want them in the output.

     If you want certain characters like =, |, <, or space in your
     separator, you can't simply type them because DOS gives them special
     meanings. Use special "numeric escape sequences" to represent those
     characters in the /N option. For example, to make your output look
     like this:

             1. 99 : text1a
             2. 99 : text1b
             2.100 : text1c

             1.398 : text2a
             2.399 : text2b

     use the sequence \32 to represent the space character, like this:

             cmp /N\32:\32 /F3 file1 file2

     The numeric escape sequences are a backslash (\) followed by the
     numeric value of the character, up to three decimal digits. A leading
     0 denotes octal; a leading 0x or 0X denotes hexadecimal. Here are some
     sample sequences:

                      instead of          use any of
                      (space)             \32  \0x20 \040
                      (tab)               \9   \0x09 \011
                      < (less)            \60  \0x3C \074
                      = (equal)           \61  \0x3D \075
                      > (greater)         \62  \0x3E \076
                      | (vertical bar)    \124 \0x7C \0174
                      " (double quote)    \34  \0x22 \042

     The above are only examples: you can enter any character as a numeric
     sequence. For example, capital A would be \65, \0x41, or \0101.

/Q
     Suppress the logo, any warning messages about individual truncated
     lines (see /W, above), and the final display of line counts for the
     two files. If any lines were truncated, a single message will still
     appear at the end of processing.

     For even quieter operation, use the /QQ option, described immediately
     below. (Separate /Q and /QQ options exist for historical reasons. /QQ
     was added in response to user requests, rather than change the
     operation of /Q, which existing users might be depending on.)

/QQ
     In addition to turning on the /Q option, suppress the blank lines
     between difference blocks, and send the header (identification of
     files) and footer (summary counts of differences found) to stderr
     instead of stdout. The result is that, if you have the /QQ option
     turned on, you can redirect the output of CMP and you will get only
     the difference lines from the two files. You still get line numbers,
     but by using the /F option you can force them to a fixed format that
     is easily stripped away.

     Example:

             cmp /QQ /F6 file1 file2 >output

     will send just the different lines to the file called output.
     Non-essential messages will be suppressed, because the /QQ option
     turns on the /Q option. Essential messages will appear on your screen
     because they are written to stderra nd are not redirected. Assuming
     each file has fewer than a million lines, each line written to the
     output file will have a 9-character prefix: file number (1 or 2), a
     period, a six-digit line number field, and the separator character >.

  ------------------------------------------------------------------------

                            Environment variable

  ------------------------------------------------------------------------
If you use certain options frequently, you can put them in the ORS_CMP
environment variable. You have the same freedom as on the command line:
leading slashes or hyphens, space separation or options run together, caps
or lower case.

CMP processes the environment variable before any command-line options,
which means that an option on the command line will override the
corresponding option in the environment variable.

The toggles, /B /E /I /Q /QQ /T, reverse their state every time you specify
them. So if you usually want case-blind comparisons, put /I in the
environment variable. Then, if you want case-sensitive comparisons for a
particular run, simply put /I on the command line and that will reverse the
setting from the environment variable. To alter the settings of other
options, like /L and /F, simply put the option on the command line with the
desired setting.

Particularly in a batch file, you may want to be sure that the environment
variable, if set, doesn't affect the option settings. To ensure this, put
the /Z option first on the command line.

If you have any question which options are in effect, simply use /D on the
command line to display all option values.

  ------------------------------------------------------------------------

                               Return values

  ------------------------------------------------------------------------
By default, CMP will return one of the following values to DOS, and you can
test the return value with IF ERRORLEVEL in a batch file.



You might want to use CMP in a batch file or a makefile and take different
actions depending on whether two files are the same or different. To do
this, use the /0 or /1 option. The /1 option emulates UNIX diff by
returning an error level of 1 if the files are different or 0 if they're
the same. /0 is the opposite: it returns 0 if the files are different or 1
if they're the same. In other words, the /0 or /1 option gives the value
CMP should return if differences are found.

  ------------------------------------------------------------------------

                      How spaces and tabs are handled

  ------------------------------------------------------------------------
This section gives some more details about the effects of the /B and /T
options, which control the treatment of spaces and tabs within a line.

CMP applies the /B and /T option settings while reading each line from
file. In fact, CMP actually makes the changes to its own in-RAM copy of
each line, so that when differences are found CMP displays the transformed
line.

CMP always ignores any spaces and tabs at the end of a line, regardless of
the options. CMP also ignores any difference between the UNIX line-ending
convention (LF only) and the DOS convention (CR+LF).

There can be some interaction between the /B and /T option settings and the
/Wwidth setting. The /W option specifies the maximum effective line width,
but the effective line width of a line can be less or greater than the
actual length of that line in characters:

   * If you don't specify /B, runs of spaces and tabs are squeezed to a
     single space, so the effective line width can be less than the actual
     width.
   * If you specify /B and not /T, tabs are expanded to a run of spaces, so
     the effective line width can be greater than the actual line length.
   * If you specify /B and /T, tabs and spaces are treated as ordinary
     characters.
   * Spaces and tabs at the end of a line count against the line width even
     though they aren't used in deciding whether the line is the same as
     the corresponding line from the other file.

For this reason, if any line's effective width exceeds the /W width, CMP
will tell you the maximum effective width at the end of the run.

Since CMP normally disregards the above differences in spacing within a
line, as well as completely blank lines, if the program finds no
differences it will report that the files are "effectively identical". If
you want to compare for character-by-character identity, including spaces,
tabs, and blank lines, specify the /BET options. Then if the program finds
no differences it will report that the files are identical.

  ------------------------------------------------------------------------

                                What's new?

  ------------------------------------------------------------------------

v4.3, 2000-05-22
     add the /Z option; update the logo message to use the URL for Oak Road
     Systems; expand the help message; suggest "cmp /?|more" when the user
     types cmp with no files
v4.2, 1999-10-31
     add the /F option, the /N option, and the /QQ option; send the help
     message to stdout instead of stderr as previously; reorganize this
     documentation file and add many hyperlinks and a few small
     clarifications
v4.1b, 1999-08-04
     update contact information (new physical address and URL); simplify
     registration options; add site license pricing (no changes to code or
     documentation)
v4.1a, 1999-02-20
     no changes to code or documentation, only updated contact information
     (new ISP)
v4.1, 1999-01-09
     Add the /I and /D options. Split the confusing three-valued /Bn option
     into separate /B and /T toggle-type options. Change the 32-bit default
     to /L100. Improve diagnostics for a bad option in the environment
     variable. Convert documentation to HTML from Word for Windows.
v4.0, 1998-11-18
     Package the existing version 4.0 for shareware release: revise
     documents without changing the software.
v4.0, 06/98
     Allow multiple filespecs before a directory name, not just one
     filespec with wildcards. Support long filenames in the new 32-bit
     version.
v3.4, 10/97
     Add the /0 and /1 options; systematize all return values. No longer
     require the trailing backslash on a directory argument. Instead of
     "effectively identical", report a more specific phrase when the files
     are not significantly different based on the /B and /E options.
v3.3, 07/97
     Compress sequences of spaces and tabs to a single space; add the /B
     option to control that feature and tab expansion. Add the /Q option.
     Make the format of command-line options more flexible, and scan the
     ORS_CMP environment variable for options.
v3.0, 07/94
     Allocate string arrays far, allowing larger combined values of /L and
     /W.
v2.4, 11/89
     Default to /L20 (previously /L10).
v2.1, 03/85
     Expand tabs in input lines to the appropriate number of spaces.
v1.1, 10/84
     Allow wildcards in the first file argument.
v1.0, 08/84
     initial version
