NAME
OTC_Regex -
Class for performing regular expression matches.
SYNOPSIS
#include <OTC/text/regex.hh>
class OTC_Regex : public OTC_Pattern
{
public:
static os_typespec* get_os_typespec();
~OTC_Regex();
OTC_Regex(char const* thePattern);
OTC_Boolean isValid() const;
OTC_Boolean match(char const* theString);
u_int start() const;
u_int start(u_int theIndex) const;
u_int length() const;
u_int length(u_int theIndex) const;
inline OTC_Range range() const;
OTC_Range range(u_int theIndex) const;
inline void modw(char const* theString);
inline char const* error() const;
static OTC_Regex& whiteSpace();
static OTC_Regex& optWhiteSpace();
static OTC_Regex& nonWhiteSpace();
static OTC_Regex& alphabetic();
static OTC_Regex& lowerCase();
static OTC_Regex& upperCase();
static OTC_Regex& alphaNumeric();
static OTC_Regex& identifier();
static OTC_Regex& matchingQuotes();
protected:
inline void compile();
OTC_Boolean re_comp(char const* pat);
int re_exec(char const* lp);
void re_modw(char const* s);
int re_subs(char const* src, char* dst);
virtual void re_fail(char const* msg, char op);
};
CLASS TYPE
Concrete
DESCRIPTION
This class can be used to determine if some string matches a
particular regular expression. In addition, information can be
obtained about the match, so that string substitutions can be
undertaken.
The pattern style is like that for ex(1)
.
INITIALISATION
OTC_Regex(char const* thePattern);
PATTERN COMPILATION
inline void compile();
ERRORS
OTC_Boolean isValid() const;
Returns OTCLIB_TRUE
if the pattern is
valid.
PATTERN MATCHING
OTC_Boolean match(char const* theString);
Returns OTCLIB_TRUE
if theString
matches
the most recently compiled pattern.
If no pattern had been compiled then
OTCLIB_FALSE
is returned.
The following functions return information about the area of
a string which a pattern matched with. A value of 0
for theIndex
indicates matches with the complete pattern. A value greater than
zero indicates matches with subpatterns as designated by the
\\(\\)
notation.
All functions return 0
if no match has occurred.
u_int start() const;
Returns the index into the string where the
matched portion started.
u_int start(u_int theIndex) const;
Returns the index into the string where
the match began for the tagged portion
indicated by theIndex
.
u_int length() const;
Returns the length of the matched portion.
u_int length(u_int theIndex) const;
Returns the length of the match for the
tagged portion indicated by theIndex
.
inline OTC_Range range() const;
Returns a range object for the matched
portion of the string.
OTC_Range range(u_int theIndex) const;
Returns a range object for the matched
tagged portion of the string indicated by
theIndex
.
CUSTOMISATION
inline void modw(char const* theString);
The characters in the null terminated
string theString
are added to the set of
characters identified as being in a word.
If theString
is null or is zero length,
then the set of characters is reset to
the default.
BUILTIN REGULAR EXPRESSIONS
Some commonly used regular expressions are defined as constant
character strings. These are:
static OTC_Regex& whiteSpace();
static OTC_Regex& optWhiteSpace();
Optionally matches white space.
static OTC_Regex& nonWhiteSpace();
static OTC_Regex& alphabetic();
Matches alpha characters.
static OTC_Regex& lowerCase();
Matches lower case characters.
static OTC_Regex& upperCase();
Matches upper case characters.
static OTC_Regex& alphaNumeric();
Matches alphanumeric characters.
static OTC_Regex& identifier();
static OTC_Regex& matchingQuotes();
Matches string delineated with double
quotes. Note that this does not ignore
a quote preceded by a slosh, instead
that quote will be seen as the terminating
quote.
REGULAR EXPRESSIONS
Regular expressions can be of the following forms:
[1] char
matches itself, unless it is a special character
(metachar): . \ [ ] * + ^ $
[2] .
[3] \\
matches the character following it, except when followed by a left
or right round bracket, a digit 1 to 9, a left or right angle
bracket or one of the characters "bnfrt". (see [7], [8], [9] and
[12]) It is used as an escape character for all other
meta-characters, and itself. When used in a set ([4]), it is
treated as an ordinary character.
[4] [set]
matches one of the characters in the set. If the first character
in the set is "^", it matches a character NOT in the set. A
shorthand S-E is used to specify a set of characters S upto E,
inclusive. The special characters "]" and "-" have no special
meaning if they appear as the first chars in the set.
examples:
[a-z] matches any lowercase alpha
[^]-] matches any char except ] and -
[^A-Z] matches any char except uppercase alpha
[a-zA-Z] matches any alpha
[5] *
any regular expression form [1] to [4], followed by closure char
(*) matches zero or more matches of that form.
[6] +
same as [5], except it matches one or more.
[7]
a regular expression in the form [1] to [10], enclosed as \\(form\\)
matches what form matches. The enclosure creates a set of tags,
used for [8] and for pattern substitution. The tagged forms are
numbered starting from 1.
[8]
a \ followed by a digit 1 to 9 matches whatever a previously
tagged regular expression ([7]) matched.
[9] \\<\\>
a regular expression starting with a \\< construct and/or ending
with a \\> construct, restricts the pattern matching to the
beginning of a word, and/or the end of a word. A word is defined
to be a character string beginning and/or ending with the
characters A-Z a-z 0-9 and _. It must also be preceded and/or
followed by any character outside those mentioned.
[10]
a composite regular expression xy where x and y are in the form
[1] to [10] matches the longest match of x followed by a match for
y.
[11] ^ $
a regular expression starting with a ^ character and/or ending
with a $ character, restricts the pattern matching to the
beginning of the line, or the end of line. [anchors] Elsewhere in
the pattern, ^ and $ are treated as ordinary characters.
[12] \\b \\n \\f \\r \\t
these are used in a regular expression to denote the special
characters backspace, newline, form feed, carriage return and tab.
NOTES
This uses the regex routines in v07i021 of volume 7
of comp.sources.misc.
SEE ALSO
ex(1)
, OTC_Pattern
LIBRARY
OTC
AUTHOR(S)
Graham Dumpleton
COPYRIGHT
Copyright 1991 1992 OTC LIMITED
Copyright 1994 DUMPLETON SOFTWARE CONSULTING PTY LIMITED