


FLEX(1)                   USER COMMANDS                   FLEX(1)



NAME
     flex - fast lexical analyzer generator

SYNOPSIS
     flex [ -dfirstvFILT -c[efmF] -Sskeleton_file ] [ _f_i_l_e_n_a_m_e ]

DESCRIPTION
     _f_l_e_x is a rewrite of _l_e_x intended  to  right  some  of  that
     tool's  deficiencies:  in particular, _f_l_e_x generates lexical
     analyzers much faster, and the analyzers use smaller  tables
     and run faster.

OPTIONS
     In addition  to  lex's  -t  flag,  flex  has  the  following
     options:

     -d   makes the generated scanner run in _d_e_b_u_g  mode.   When-
          ever  a pattern is recognized the scanner will write to
          _s_t_d_e_r_r a line of the form:

              --accepting rule #n

          Rules are numbered  sequentially  with  the  first  one
          being 1.

     -f   has the same effect as lex's -f flag (do  not  compress
          the  scanner  tables);  the  mnemonic changes from _f_a_s_t
          _c_o_m_p_i_l_a_t_i_o_n to (take your  pick)  _f_u_l_l  _t_a_b_l_e  or  _f_a_s_t
          _s_c_a_n_n_e_r.  The  actual  compilation  takes _l_o_n_g_e_r, since
          flex is I/O bound writing out the big table.

          This option is equivalent to -cf (see below).

     -i   instructs flex to generate a _c_a_s_e-_i_n_s_e_n_s_i_t_i_v_e  scanner.
          The  case  of  letters given in the flex input patterns
          will be ignored, and the rules will be matched  regard-
          less  of  case.   The matched text given in _y_y_t_e_x_t will
          have the preserved case (i.e., it will not be folded).

     -r   specifies that the scanner uses the REJECT action.

     -s   causes the _d_e_f_a_u_l_t _r_u_l_e (that unmatched  scanner  input
          is  echoed to _s_t_d_o_u_t) to be suppressed.  If the scanner
          encounters input that does not match any of its  rules,
          it  aborts  with  an  error.  This option is useful for
          finding holes in a scanner's rule set.

     -v   has the same meaning as for lex (print to _s_t_d_e_r_r a sum-
          mary  of  statistics  of  the generated scanner).  Many
          more statistics are printed, though,  and  the  summary
          spans  several lines.  Most of the statistics are mean-
          ingless to the casual flex user.



Sun Release 3.4     Last change: 13 May 1987                    1






FLEX(1)                   USER COMMANDS                   FLEX(1)



     -F   specifies that the _f_a_s_t  scanner  table  representation
          should  be  used.  This representation is about as fast
          as the full table representation  (-_f),  and  for  some
          sets  of patterns will be considerably smaller (and for
          others, larger).  In general, if the pattern  set  con-
          tains  both  "keywords"  and  a catch-all, "identifier"
          rule, such as in the set:

               "case"    return ( TOK_CASE );
               "switch"  return ( TOK_SWITCH );
               ...
               "default" return ( TOK_DEFAULT );
               [a-z]+    return ( TOK_ID );

          then you're better off using the full table representa-
          tion.  If only the "identifier" rule is present and you
          then use a hash table or some such to detect  the  key-
          words, you're better off using -_F.

          This option is equivalent to -cF (see below).

     -I   instructs flex  to  generate  an  _i_n_t_e_r_a_c_t_i_v_e  scanner.
          Normally,  scanners generated by flex always look ahead
          one character before deciding  that  a  rule  has  been
          matched.   At  the possible cost of some scanning over-
          head (it's not clear that more overhead  is  involved),
          flex  will  generate  a  scanner which only looks ahead
          when needed.   Such  scanners  are  called  _i_n_t_e_r_a_c_t_i_v_e
          because  if you want to write a scanner for an interac-
          tive system such as a command shell, you will  probably
          want  the user's input to be terminated with a newline,
          and without -I the user will have to type  a  character
          in addition to the newline in order to have the newline
          recognized.  This leads to dreadful interactive perfor-
          mance.

          If all this seems  to  confusing,  here's  the  general
          rule:  if  a  human  will  be  typing  in input to your
          scanner, use -I, otherwise don't;  if  you  don't  care
          about how fast your scanners run and don't want to make
          any assumptions about the input to your scanner, always
          use -I.

          Note, -I cannot be used in  conjunction  with  _f_u_l_l  or
          _f_a_s_t _t_a_b_l_e_s, i.e., the -f, -F, -cf, or -cF flags.

     -L   instructs flex to not generate  #line  directives  (see
          below).

     -T   makes flex run in _t_r_a_c_e mode.  It will generate  a  lot
          of  messages to standard out concerning the form of the
          input   and   the   resultant   non-deterministic   and



Sun Release 3.4     Last change: 13 May 1987                    2






FLEX(1)                   USER COMMANDS                   FLEX(1)



          deterministic finite automatons.  This option is mostly
          for use in maintaining flex.

     -c[efmF]
          controls the degree of table compression.  -ce  directs
          flex  to  construct  _e_q_u_i_v_a_l_e_n_c_e _c_l_a_s_s_e_s, i.e., sets of
          characters which have identical lexical properties (for
          example,  if  the only appearance of digits in the flex
          input is in the character class "[0-9]" then the digits
          '0',  '1',  ...,  '9'  will  all  be  put  in  the same
          equivalence  class).   -cf  specifies  that  the   _f_u_l_l
          scanner  tables  should  be generated - flex should not
          compress the tables by  taking  advantages  of  similar
          transition  functions for different states.  -cF speci-
          fies that the  alternate  fast  scanner  representation
          (described  above under the -F flag) should be used.  -
          cm directs flex to construct _m_e_t_a-_e_q_u_i_v_a_l_e_n_c_e  _c_l_a_s_s_e_s,
          which  are  sets of equivalence classes (or characters,
          if equivalence classes are not  being  used)  that  are
          commonly  used  together.  A lone -c specifies that the
          scanner  tables  should  be  compressed   but   neither
          equivalence classes nor meta-equivalence classes should
          be used.

          The options -cf or  -cF  and  -cm  do  not  make  sense
          together - there is no opportunity for meta-equivalence
          classes if the table is not being  compressed.   Other-
          wise the options may be freely mixed.

          The default setting is -cem which specifies  that  flex
          should   generate   equivalence   classes   and   meta-
          equivalence classes.  This setting provides the highest
          degree   of  table  compression.   You  can  trade  off
          faster-executing scanners at the cost of larger  tables
          with the following generally being true:

              slowest            smallest
                         -cem
                         -ce
                         -cm
                         -c
                         -c{f,F}e
                         -c{f,F}
              fastest            largest


     -Sskeleton_file
          overrides the default skeleton  file  from  which  flex
          constructs its scanners.  You'll never need this option
          unless you are doing flex maintenance or development.





Sun Release 3.4     Last change: 13 May 1987                    3






FLEX(1)                   USER COMMANDS                   FLEX(1)



INCOMPATIBILITIES WITH LEX
     _f_l_e_x is fully compatible with _l_e_x with the following  excep-
     tions:

     -    There is no run-time library to link with.  You needn't
          specify  -_l_l  when  linking, and you must supply a main
          program.  (Hacker's note: since the  lex  library  con-
          tains a main() which simply calls yylex(), you actually
          _c_a_n be lazy and not supply your own  main  program  and
          link with -_l_l.)

     -    lex's %r (Ratfor scanners) and %t  (translation  table)
          options are not supported.

     -    The do-nothing -_n flag is not supported.

     -    When definitions are expanded, flex  encloses  them  in
          parentheses.  With lex, the following

              NAME    [A-Z][A-Z0-9]*
              %%
              foo{NAME}?      printf( "Found it\n" );
              %%

          will not match the string "foo" because when the  macro
          is  expanded  the rule is equivalent to "foo[A-Z][A-Z0-
          9]*?" and the precedence is such that the '?' is  asso-
          ciated  with  "[A-Z0-9]*".  With flex, the rule will be
          expanded to "foo([A-z][A-Z0-9]*)?" and  so  the  string
          "foo" will match.

     -    yymore() is not supported.

     -    The undocumented lex-scanner internal variable yylineno
          is not supported.

     -    If your input uses REJECT, you must run flex  with  the
          -r  flag.   If you leave out the flag, the scanner will
          abort at run-time with a message that the  scanner  was
          compiled without the flag being specified.

     -    The input() routine is not redefinable, though  may  be
          called  to  read characters following whatever has been
          matched by a rule.  If input() encounters  and  end-of-
          file   the  normal  yywrap()  processing  is  done.   A
          ``real'' end-of-file is returned as _E_O_F.

          Input can be  controlled  by  redefining  the  YY_INPUT
          macro.       YY_INPUT's     calling     sequence     is
          "YY_INPUT(buf,result,max_size)".   Its  action  is   to
          place up to max_size characters in the character buffer
          "buf" and  return  in  the  integer  variable  "result"



Sun Release 3.4     Last change: 13 May 1987                    4






FLEX(1)                   USER COMMANDS                   FLEX(1)



          either  the  number  of characters read or the constant
          YY_NULL (0 on Unix systems) systems) to  indicate  EOF.
          The default YY_INPUT reads from the file-pointer "yyin"
          (which is by default _s_t_d_i_n), so if  you  just  want  to
          change  the input file, you needn't redefine YY_INPUT -
          just point yyin at the input file.

          A sample redefinition of YY_INPUT (in the first section
          of the input file):

              %{
              #undef YY_INPUT
              #define YY_INPUT(buf,result,max_size) \
                  result = (buf[0] = getchar()) == EOF ? YY_NULL : 1;
              %}

          You also can add in things like counting keeping  track
          of  the  input  line  number this way; but don't expect
          your scanner to go very fast.

     -    output() is not supported.  Output from the ECHO  macro
          is done to the file-pointer "yyout" (default _s_t_d_o_u_t).

     -    Trailing context is restricted to patterns  which  have
          either  a  fixed-sized  leading  part  or a fixed-sized
          trailing part.  For  example,  "a*/b"  and  "a/b*"  are
          okay,  but  not  "a*/b*".  This restriction is due to a
          bug in the trailing context algorithm given in  _P_r_i_n_c_i_-
          _p_l_e_s  _o_f  _C_o_m_p_i_l_e_r  _D_e_s_i_g_n (and _C_o_m_p_i_l_e_r_s - _P_r_i_n_c_i_p_l_e_s,
          _T_e_c_h_n_i_q_u_e_s, _a_n_d _T_o_o_l_s) which can result in  mismatches.
          Try the following lex program

              %%
              x+/xy           printf( "I found \"%s\"\n", yytext );

          on the input "xxy".  (If anyone knows of a  fast  algo-
          rithm for finding the beginning of trailing context for
          an arbitrary pair of regular expressions, please let me
          know!) If you must have arbitrary trailing context, you
          can use yyless() to effect it.

     -    flex reads only one input file, while  lex's  input  is
          made up of the concatenation of its input files.

ENHANCEMENTS
     -    _E_x_c_l_u_s_i_v_e _s_t_a_r_t-_c_o_n_d_i_t_i_o_n_s can be declared by using  %x
          instead of %s. These start-conditions have the property
          that when they are active, _n_o _o_t_h_e_r _r_u_l_e_s  _a_r_e  _a_c_t_i_v_e.
          Thus  a  set  of  rules  governed by the same exclusive
          start condition describe a scanner which is independent
          of  any  of  the  other  rules in the flex input.  This
          feature makes it easy to specify "mini-scanners"  which



Sun Release 3.4     Last change: 13 May 1987                    5






FLEX(1)                   USER COMMANDS                   FLEX(1)



          scan  portions of the input that are syntactically dif-
          ferent from the rest (e.g., comments).

     -    flex dynamically resizes its internal tables, so direc-
          tives  like  "%a  3000"  are not needed when specifying
          large scanners.

     -    The scanning routine  generated  by  flex  is  declared
          using  the  macro YY_DECL. By redefining this macro you
          can change the routine's name and its calling sequence.
          For example, you could use:

              #undef YY_DECL
              #define YY_DECL float lexscan( a, b ) float a, b;

          to give it the name _l_e_x_s_c_a_n,  returning  a  float,  and
          taking two floats as arguments.

     -    flex generates #line directives mapping  lines  in  the
          output to their origin in the input file.

     -    You  can  put  multiple  actions  on  the  same   line,
          separated with semi-colons.  With lex, the following

              foo    handle_foo(); return 1;

          is truncated to

              foo    handle_foo();

          flex does not truncate the action.   Actions  that  are
          not enclosed in braces are terminated at the end of the
          line.

     -    Actions can be begun with %{ and terminated with %}. In
          this  case,  flex  does  not count braces to figure out
          where the action ends - actions are terminated  by  the
          closing  %}.  This  feature is useful when the enclosed
          action has extraneous braces in it (usually in comments
          or  inside inactive #ifdef's) that throw off the brace-
          count.

     -    All of the scanner actions  (e.g.,  ECHO,  yywrap  ...)
          except the unput() and input() routines, are written as
          macros, so they can be redefined if  necessary  without
          requiring a separate library to link to.

FILES
     _f_l_e_x._s_k_e_l
          skeleton scanner

     _f_l_e_x._f_a_s_t_s_k_e_l



Sun Release 3.4     Last change: 13 May 1987                    6






FLEX(1)                   USER COMMANDS                   FLEX(1)



          skeleton scanner for -f and -F

     _f_s_k_e_l_c_o_m._h
          common definitions for skeleton files

     _f_s_k_e_l_d_e_f._h
          definitions for compressed skeleton file

     _f_a_s_t_s_k_e_l_d_e_f._h
          definitions for -f, -F skeleton file

SEE ALSO
     lex(1)

     M. E. Lesk and E. Schmidt, _L_E_X - _L_e_x_i_c_a_l _A_n_a_l_y_z_e_r _G_e_n_e_r_a_t_o_r

AUTHOR
     Vern Paxson, with the help of many ideas and  much  inspira-
     tion  from Van Jacobson.  Original version by Jef Poskanzer.
     Fast table representation is a partial implementation  of  a
     design done by Van Jacobson.  The implementation was done by
     Kevin Gong and Vern Paxson.

     Thanks to the many flex beta-testers, especially Casey  Lee-
     dom,  Nick  Christopher,  Chris  Faylor, Eric Goldman, Craig
     Leres, Mohamed el Lozy, Esmond Pitt, Jef Poskanzer, and Dave
     Tallman.   Thanks  to  John Gilmore, Bob Mulcahy, Rich Salz,
     and Richard Stallman  for  help  with  various  distribution
     headaches.

     Send comments to:

          Vern Paxson
          Real Time Systems
          Bldg. 46A
          Lawrence Berkeley Laboratory
          1 Cyclotron Rd.
          Berkeley, CA 94720

          (415) 486-6411

          vern@lbl-{csam,rtsg}.arpa
          ucbvax!lbl-csam.arpa!vern


     The Atari ST port was done by Michael Vishchers.

DIAGNOSTICS
     _f_l_e_x _s_c_a_n_n_e_r _j_a_m_m_e_d - a scanner compiled with -s has encoun-
     tered  an  input  string  which wasn't matched by any of its
     rules.




Sun Release 3.4     Last change: 13 May 1987                    7






FLEX(1)                   USER COMMANDS                   FLEX(1)



     _f_l_e_x _i_n_p_u_t _b_u_f_f_e_r _o_v_e_r_f_l_o_w_e_d -  a  scanner  rule  matched  a
     string  long enough to overflow the scanner's internal input
     buffer (as large as BUFSIZ in "/usr/include/stdio.h").   You
     can edit _f_s_k_e_l_c_o_m._h and increase YY_BUF_SIZE and YY_MAX_LINE
     to increase this limit.

     _R_E_J_E_C_T _u_s_e_d _a_n_d _s_c_a_n_n_e_r _w_a_s _n_o_t _g_e_n_e_r_a_t_e_d _u_s_i_n_g  -_r  -  just
     like it sounds.  Your scanner uses REJECT. You must run flex
     on the scanner description using the -r flag.

     _o_l_d-_s_t_y_l_e _l_e_x _c_o_m_m_a_n_d _i_g_n_o_r_e_d - the flex  input  contains  a
     lex command (e.g., "%n 1000") which is being ignored.

BUGS
     Use of unput() or input() trashes  the  current  yytext  and
     yyleng.

     Use of unput() to push back more text than was  matched  can
     result  in the pushed-back text matching a beginning-of-line
     ('^') rule even though it didn't come at  the  beginning  of
     the line.

     Nulls are not allowed in flex inputs or  in  the  inputs  to
     scanners  generated by flex.  Their presence generates fatal
     errors.

     Do not mix trailing context with the '|'  operator  used  to
     specify  that  multiple rules use the same action.  That is,
     avoid constructs like:

             foo/bar      |
             bletch       |
             bugprone     { ... }

     They can result in subtle mismatches.  This is actually  not
     a  problem  if there is only one rule using trailing context
     and it is the first in the list (so the above  example  will
     actually  work okay).  The problem is due to fall-through in
     the action switch  statement,  causing  non-trailing-context
     rules  to  execute the trailing-context code of their fellow
     rules.  This should be fixed, as it's a nasty  bug  and  not
     obvious.   The  proper  fix  is  for  flex  to  spit  out  a
     FLEX_TRAILING_CONTEXT_USED #define and then have the  backup
     logic  in a separate table which is consulted for each rule-
     match, rather than as part of the rule action.  The place to
     do  the  tweaking is in add_accept() - any kind soul want to
     be a hero?

     The pattern:

          x{3}




Sun Release 3.4     Last change: 13 May 1987                    8






FLEX(1)                   USER COMMANDS                   FLEX(1)



     is considered to be  variable-length  for  the  purposes  of
     trailing context, even though it has a clear fixed length.

     Due to both buffering of input and  read-ahead,  you  cannot
     intermix  calls  to,  for example, getchar() with flex rules
     and expect it to work.  Call input() instead.

     The total table entries listed by the -v flag  excludes  the
     number  of  table  entries needed to determine what rule has
     been matched.  The number of entries is equal to the  number
     of  DFA  states if the scanner was not compiled with -r, and
     greater than the number of states if it was.

     The scanner run-time speeds have not been optimized as  much
     as  they deserve.  Van Jacobson's work shows that the can go
     quite a bit faster still.







































Sun Release 3.4     Last change: 13 May 1987                    9



