grep.h
=============================================================================

    The grep* and glob* provide regular expression parsing and matching.
    The main difference between the two sets is what they are designed
    for. The grep* functions are generalized and follow the *IX regular
    expression syntax. The glob* functions are designed for matching
    filenames against wildcards. The are also modelled after the *IX
    environment and as such are a lot more versatile that MS-DOS limited
    capabilities.

    The grep* regular expression compiler recognizes the following tokens:

    c		- normal character, matches this character (case-sensitive)
    \       - backslash, escapes any character following it. For example,
    		  '\$' matches '$', '\\' matches '\' and '\i' matches 'i'.
    ^		- circumflex, at start of expression matches beginning of line.
    $		- dollar sign at end of expression matches the end of line.
    .		- period, matches any one character (there has to be one).
    *		- star, matches zero or more occurences of preceding expression.
    +		- plus, matches one or more occurences of preceding expression.
    		  The plus and the star must follow a valid expression.
    []		- square brackes, matches any of the characters in the set of
    		  characters included between them. If the first character is
    		  the circumflex, '^', then matches any character that is not
    		  in the set between the brackets. You can specify ranges of
    		  characters, like this [a-zA-Z] for the whole alphabet, or
    		  [0-9aeyuio] for all digits and the vowels. If the '-' is the
    		  first or last character in the set, it loses its special
    		  meaning. For a range to be valid, the first character in it
    		  must be smaller (syntactically) than the other.

    The glob* compiler recognizes the following tokens:

    c		- normal character, matches the same character (case-sensitive)
    ?		- matches any one character (there has to be one)
    *		- matches zero or more characters (longest match)
    []		- same as brackets for grep* (see above)

    Note that even though it was designed for filename matching, the glob*
    functions do not do any checking as to the filenames, they just try
    to match blindly against the pattern. The only difference is in the
    tokens recognized and in the way the '*' is processed.

    Each regular expression must be compiled in order to be usable. The
    longest compilable string depends on the complexity of the expression.
    The buffer is currently 512 bytes long. This should be adequate for
    most needs. The compiling scheme is really easy. Each normal charac-
    ter is stored verbatim. The special sequences are encoded. Stars are
    stored a little bit differently in grep* and glob* modes. Each
    compiled pattern is terminated by the End-of-Pattern symbol, followed
    by the NUL-string terminator. Sets are encoded with a start and end
    characters. The ranges have only a start character followed by the
    start and end of range symbols. Stars are stored as an escape symbol
    followed by the pattern they affect, terminated with the End-of-Pattern
    symbol (in grep* mode) and simply as a special symbol (in glob* mode).
    Pluses are stored like the stars (in grep* mode only).

    The external variable grepError will hold the error that caused the
    last requested operation to fail (applies only to compiling). The
    values that it can take are defined in the header file.

    The compiling functions return 0 on success and -1 on error. The
    grep* functions return a pointer to the first character of the match
    in the string or NULL on error. The glob* functions return 0 on
    match and -1 on error.

    The globbing functions are a lot like the UNIX shells' command-line
    parsing. For example, extensions have no real meaning and the '*'
    matches everything (not limited to name or extension). Also, sets
    are allowed. For example, 'tes*e' will match 'tese', 'teste',
    'tes.e', 'test_this_here', etc.

    If you need to test a lot of input against the same pattern, it is
    recommended that you use the *_compile() functions followed by
    calls to *_match(). On the other side, if you want to change the
    pattern, use the grep() and glob() functions that let you specify
    the pattern and the input string to match. They will compile the
    expression internally, so you won't need to do that. Because they
    compile the expression every time they are called, they tend to
    run a lot slower.

    Note that since the grep* and glob* functions use the same static
    buffer for the compiled expression, each call to a compiling
    function will overwrite it.

    The example GLOBTEST.BAT file will test the globbing function. You
    will need the compile the GREPTEST.C for it to work. You can see
    why some of the things fail and why others work.

    Currently, the grep* mode does not recognize parenthesis, so you
    cannot group expressions. Also, none of the extended regular
    expressions are recognized, only the basic.


    GREP_COMPILE, GLOB_COMPILE
    -------------------------------------------------------------------------

        Summary   Compile a pattern to internal format

        Syntax    int grep_compile(const char *pattern);
                  int glob_compile(const char *pattern);

        Remarks   These functions compile a regular expression to the
                  internal format used by the matching functions. Note that
                  static buffers are used by the compilers and matchers.
                  This makes the code non-reentrant, sorry. This also makes
                  it impossible to scan for several regular expressions at
                  a time (since each compilation overwrites the previous
                  one). You must call the appropriate compile function
                  before you can use the matching routines. For acceptable
                  patterns, refer to the documentation above.

         Return	  On success, both functions return non-zero values
                  On error (illegal or long pattern), both return 0


    GREP_MATCH, GLOB_MATCH
    -------------------------------------------------------------------------

        Summary   Match the compiled pattern against user-supplied text

        Syntax    char* grep_match(const char *text);
                  int   glob_match(const char *text);

        Remarks	  These functions match the pattern compiled by the
                  grep_compile() or glob_compile() respectively against
                  the text in the 'text' parameter.

        Return    grep_match() returns a pointer to the beginning of
                  the text that matched (NULL if no match)
                  glob_match() returns a non-zero value if the text
                  argument matches the wildcard patter (0 if it does not)


    GREP, GLOB
    -------------------------------------------------------------------------

        Summary   Compile and match in a single step

        Syntax    char* grep(const char *pattern, const char *text);
                  int   glob(const char *pattern, const char *text);

        Remarks   These routines compile and match in a single step. You
                  can use these instead of the compile/match combo if the
                  pattern will be changing between calls (if you will be
                  making multiple matches using the same pattern, you
                  should use the functions above).

        Return    Same as the matching routines.


    GREPERROR
    -------------------------------------------------------------------------

        Summary   Contains the error that failed the compile/match

        Syntax    extern int grepError;

        Remarks   This external variable is set by the compile and match
                  routines to signify the type of error that caused the
                  function to fail. You can use any of the following
                  constants to test its value:

                      GREP_OK      // no errors
                      GREP_BADESC  // escape character was followed by EOS
                      GREP_OVRFLOW // pattern too complex, buffer overflow
                      GREP_BADSET  // bad set specification, not terminated
                      GREP_BADPAT  // '*' or '+' not allowed here
                      GREP_BADSTAR // '*' followed another '*'

