M4.DOC                  V 1.5  19 Dec 91




Copyright 1986,1991  Michael M Rubenstein

M4 is a MSDOS version of the Unix (tm) m4 macro processor.  All
functions of the Unix version 7 m4 are supported as well as a
number of extensions.  This version was written without reference
to the Unix source and should not be expected to match the Unix
version if used in an undocumented manner.

M4 requires MSDOS or PCDOS version 2.0 or higher.


Modification Log

1.7    22 Jun 94  Removed swapcmd.

1.6               13 Jan 92   Added hashing of defined names for
                  faster lookup.

1.5               13 Dec 91   Added output, redefine, trace,
                  changeparen, changecomma.  Removed emit.

1.4               24 Jan 91   Added char, nulout, and swapcmd.
                  Fixed minor bugs.

1.3               7 Sep 88    Changed to allow high bit
                  characters in input. Fixed bug in nobuiltin and
                  undefine.  Fixed bug in maketemp.

1.2               22 May 88   Added printf.

1.1               16 Jan 87   Added emit, ignorenl, and getenv
                  builtin macros.  Modified diversion files to
                  used TMP environment variable for directory.
                  Cleaned up code.


Usage

M4 is invoked with the command

     m4 [-o<output file>] [<input file>...]

For example, to apply m4 to the files file1.m4 file2 file3.fil,
the command

     m4 file1.m4 file2 file3.fil

is used.

Output is initially to stdout unless the -o switch is used to
specify another output file.

If no input files are given, input is taken from stdin.  Any of
the input files may be specified as -, in which case it is taken
from stdin.



Description

M4 reads one or more text files and writes the modified output to
stdout (which may, of course, be redirected).  When a macro is
encountered, it is evaluated and the result is rescanned.  Text
may be quoted by enclosing it in `' (note that these are not the
same character).  When quoted text is encountered, the quotes are
removed and the text is not scanned for macros.  However, if the
text is used in a macro evaluation and is rescanned, macro
substitution will take place.  Quotes may be nested to allow text
to be rescanned several times without evaluation.

The "define" built in macro permits definition of new macros.
For example

     define(A, 3)

defines A to be 3.  No space is permitted between the macro name,
"define", and the left parenthesis.  If a macro name is not
followed immediately by a left parenthesis, it has no arguments.

Macro names consist of letters, underscores, and digits and must
begin with a letter or underscore.  Case is significant.  Macro
names are recognized only when they appear as words (i.e.,
surrounded by characters which may not appear in a macro name).

Macros are expanded at the earliest opportunity.  Therefore, the
sequence

     define(A, 3)
     define(B, A)

would define A and B as 3.  However, since the "A" in the second
line is replaced by 3 before definition takes place, changing the
value of A will not affect B.  If the order of the definitions
were changed to

     define(B, A)
     define(A, 3)

the value of B would change with that of A (actually, B would be
replaced by A which would then be replaced by it's value).

Alternatively, one could

     define(A, 3)
     define(B, `A')

The quotes prevent A from being evaluated in the second line, so
B is again defined as A.  In general, quoting the second argument
of "define" is useful to prevent evaluation before definition.

M4 always strips off one level of quotes when it evaluates
something.  Thus, if one wishes to use the word "define" in the
text, it must be quoted.

     int `define';

will be changed to

     int define;

by M4.

Quoting is necessary when redefining a macro.  For example, the
sequence

     define(A, 2)
     define(A, 3)

would leave A defined as 2 since the second line would expand to

     define(2, 3)

The proper sequence is

     define(A, 2)
     define(`A', 3)

When a macro is redefined, the previous definition becomes
inactive.  It is not, however, deleted.  If the macro is later
undefined, the previous definition will be reinstated.

Macros may have arguments.  The definition

     define(sum, `($1 + $2)')

could be used to generate code to add two numbers.  For example,

     sum(a, 3)

would generate

     (a + 3)

Omitted arguments are replaced by a null string, so

     sum(a)

would generate

     (a +)

Excess arguments are ignored.

A macro may use up to 9 arguments, $1, $2, ..., $9.  $0 is the
name of the macro.

Software Tools (B. W. Kernighan and P. J. Plauger, Addison-
Wesley, Inc., 1976) contains a similar macro processor and much
more extensive discussion of it's usage.



Built In Macros

aquote(<leftquote>,<rightquote>,...)
       Defines up to 4 pairs of alternate quotes.  Alternate
       quotes prevent scanning for macros, but, unlike regular
       quotes, are not removed from the text.  Default is no
       alternate quotes.

       This is an extension in this version of M4.  It's primary
       purpose is to prevent evaluation of macros in strings of a
       language being preprocessed.

changecomma(<comma char>)
       Changes the comma character.  Default is ",".  This is the
       character which is used to separate arguments in macro
       invocation.  If <comma char> is empty the default (",") is
       used.

       This is an extension in this version of M4.

changeparen(<parentheses characters>)
       Changes the characters which are used to enclose the
       arguments of a macro.  Default is "()".  If <parentheses
       characters> is empty, the default ("()") is used.  If
       <parentheses characters> only contains one character, this
       macro has no effect.

       This is an extension in this version of M4.

changequote(<leftquote>, <rightquote>)
       Changes the quote characters.  Default quotes are `'.

changearg(<character>)
       Changes the argument flag character (default "$").  Note
       that the argument flag character is processed at expansion
       time, not at definition time.  This can lead to strange
       results if this macro appears after any definitions.

char(<integer>,...)
       The character with ASCII value <integer>.

       This is an extension in this version of M4.

comment(<character>)
       Changes the comment character (default #).  All input from
       a comment character to the end of the line is simply
       passed to the output.  If no argument, there will be no
       comment character.

       This is an extension in this version of M4.

decr(<integer>)
       <integer> - 1

define(<name>,<value>)
       Defines <name> to as <value>.  Up to 9 arguments may be
       used.  Arguments are referenced in the definition
       (<value>) by preceding a digit (1-9) by a special
       character (default "$").  Argument 0 is the name of the
       macro.

       If a previous definition for <name> exists, it is
       overridden.  The previous definition is not deleted and
       will be restored if the new definition is undefined.

       See also, changearg.

divert(<integer>)
       Sends output to diversion file <integer> if <integer> is
       from 1 to 9.  Restores normal output if <integer> is 0 or
       omitted.  Ignores output if <integer> is outside of the
       range 0 through 9.

       Diversions are normally copied to the output in numeric
       order after all input.  This can be modified by undivert
       (q.v.).

       Diversion files are placed in the directory specified by
       the TMP environment variable.  For example, to store
       diversion files in C:\TEMPS, use

                  set TMP=C:\TEMPS

divnum
       The number of the currently active diversion.  0 if no
       diversion.

dnl
       Deletes all characters through the next newline.  This is
       useful for preventing excess new lines from being
       translated during definition.  For example, the sequence

                  define(A, something)
             define(B, somethingelse)

       will be translated into two new lines.  By adding "dnl" to
       the end of each line, this can be prevented.

dumpdef(<name1>,<name2>,...)
       Displays the definitions of <name1>, etc. on stderr.  For
       obvious reasons, <name1>, etc. should usually be quoted.
       If there are no arguments, all definitions are displayed.

errprint(<format>,<string1>,...,<string8>)
       Prints to stderr.  <format> is as in C fprintf.  This will
       not be meaningful if <format> contains any non-string
       format codes.

eval(<expression>)
       The evaluation of the integer expression <expression>.
       Permitted operators in decreasing order of precedence are

                  + (unary), - (unary)
             *, /, %
             + (binary), - (binary)
             ==, !=, <, <=, >, >=
             !
             & or &&
             | or ||


       Note that "&" and "&&" are equivalent, as are "|" and
       "||".  "a && b" gives 1 if both a and b are nonzero, 0
       otherwise.  "a || b" gives 1 if either a or b is nonzero,
       1 otherwise.  Unlike the similar C expressions, however,
       evaluation is not stopped once the value can be
       determined.

getenv(<name>)
       If there is an environment variable <name>, the value of
       this variable.  Otherwise, null.  Note that because of the
       way MSDOS handles environment variables, <name> should
       always be upper case.

       This is an extension in this version of M4.

ifdef(<name>,<value1>,<value2>)
       If <name> is defined then <value1> else <value2>.  For
       obvious reasons, <name> should usually be quoted.

ifelse(<str1>,<str2>,<value1>,<value2>)
       If <str1>=<str2> then <value1> else <value2>.

       <value2> can be replaced by
       <str3>,<str4>,<valu3>,<value4>, in which case the result
       is

                  if <str1>=<str2> then <value1>
             else if <str3>=<str4> then <value3>
             else <value4>

       This can be extended in the obvious manner.

ignorenl(<integer>)
       If the argument is nonzero, new lines in the input stream
       will be not be copied to the output.  If zero (the
       default) new lines will be handled normally.  If the
       argument is omitted, the value is 0 if ignorenl is set or
       1 if not.

       This is an extension in this version of M4.

include(<filename>)
       The contents of the file <filename>.  It is an error for
       <filename> not to  exist.  See also sinclude.

incr(<integer>)
       <integer> + 1

index(<string1>,<string2>)
       The position (origin 0) in <string1> where <string2>
       begins.  -1 if <string2> does not occur in <string1>.

len(<string>)
       The length of <string>.

macro(<character>)
       Defines the macro character.  Macros will only be
       recognized if preceded by the macro character.  Note that
       this only applies to macro invocations.  The macro
       character is not used in the definition of a macro.  If no
       argument, macros will always be recognized.  For example,
       to define `this' as `that' when the macro character is &,
       the definition would be

                  &define(`this', `that')

       "&this" would be changed to "that", but "this" would be
       unchanged.

       This is an extension in this version of M4.

maketemp(<string>)
       Generates a unique filename.  String must contain a
       trailing substring of 6 "X"s which are changed in such a
       way the result is a unique file name.

msdos
       Defined as null in the MSDOS version.  See also unix.

output(<filename>,<append>)
       If <filename> is null, returns the name of the current
       output file.

       If <filename> is not null, sets the current output to
       <filename>.  If <append> is "a", the output is appended to
       the file, otherwise an existing file is overwritten.

       If <filename> is "-", output is set to stdout.  If it is "-
       -", output is set to stderr.

       This is an extension in this version of M4.

printf(<format>,<string1>,...,<string8>)
       Prints to current output.  <format> is as in C fprintf.
       This will not be meaningful if <format> contains any non-
       string format codes.

       This is an extension in this version of M4.

nobuiltin
       Removes the definitions of all built in macros.  Deletes
       all characters through the next newline.  Note that if any
       macros are defined which use built in macros, they will
       not expand properly.  nobuiltin is primarily useful when
       all user defined macros are simple substitutions.

       This is an extension in this version of M4 to facilitate
       use as a preprocessor.

nulout(<character>)
       Causes <character> to be ignored on output.  This
       facilitates use of a flag or separator character which
       will not appear in the final output.  If <character> is
       omitted, there will be no ignored output character
       (default).

       This is an extension in this version of M4.

redefine(<name>,<value>)
       Defines <name> to as <value>.  Up to 9 arguments may be
       used.  Arguments are referenced in the definition
       (<value>) by preceding a digit (1-9) by a special
       character (default "$").  Argument 0 is the name of the
       macro.

       If a previous definition for <name> exists, it is deleted.

       This is an extension in this version of M4.

       See also, changearg.

sinclude(<filename>)
       The contents of the file <filename>.  If <filename> does
       not exist, null.  See also include.

substr(<string>,<start>,<length>)
       The substring of <string> starting at position <start>
       (origin 0) and length <length>.  If <length> is omitted,
       the result is the rest of the string.

syscmd(<string>)
       The operating system command <string> is executed.  If
       <string> is empty, invokes the system command shell.

trace(<macro>, ...)
       Starts tracing the macros named (which, for obvious
       reasons, should be normally be quoted).  When a traced
       macro is invoked its name and arguments are printed on the
       trace file.

       This is an extension in this version of M4.

       See also untrace and tracefile.

tracefile(<name>)
       If <filename> is null, returns the name of the current
       trace output file.

       If <filename> is not null, sets the current trace output
       to <filename>.  If <append> is "a", the output is appended
       to the file, otherwise an existing file is overwritten.

       If <filename> is "-", trace output is set to stdout.  If
       it is "--", trace output is set to stderr.

       Trace output is initially set to stderr.

       This is an extension in this version of M4.


translit(<string1>,<string2>,<string3>)
       <string1> with characters in <string2> replaced by the
       corresponding characters in <string3>.  If <string3> is
       shorter than <string2>, characters without an entry in
       <string3> are deleted.

undefine(<name>)
       Removes the definition of <name>.  For obvious reasons,
       <name> should usually be quoted.

undivert(<integer>)
       Copies (and empties) diversion <integer>.  If <integer> is
       omitted, all diversions are copied.  The diversions are
       not rescanned for macros.

unix
       Defined as null in the Unix version.  See also msdos.

untrace(<macro>, ...)
       Terminates tracing of the named macros.  If no macros are
       specified, all tracing is stopped.

       This is an extension in this version of M4.

       See also trace and tracefile.



For the Hacker

Source code is provided.  M4 has been compiled for MSDOS and
Windows NT with Microsoft Visual C++ and Watcom C++ and for UNIX
for a Sun workstation with acc and gcc, but it should be very
easy to modify for other C compilers which support ANSI prototype
function definitions.

Executables are included for DOS and Windows NT.  These were
compiled with the Microsoft compilers.  The NT version has not
been tested, but should run under Windows 95.

You'll want to make sure that either UNIX or DOS (Windows NT and
Windows 95 are DOS) is defined.  For the MSDOS compilers listed
above, this is done automatically.  Also, look through the code
for the few #defines needed to accommodate idiosyncraciies
(a.k.a. bugs) of the various compilers.

When compiling for a system such as MSDOS in which stack space is
limited, make sure that the program has enough stack; at least
32K is recommended.
