Go to main content

man pages section 1: User Commands

Exit Print View

Updated: Wednesday, July 27, 2022
 
 

re2c (1)

Name

re2c - compile regular expressions to code

Synopsis

re2c  [OPTIONS] INPUT [-o OUTPUT]

re2go [OPTIONS] INPUT [-o OUTPUT]

Description

RE2C(1)                                                                RE2C(1)



NAME
       re2c - compile regular expressions to code

SYNOPSIS
       re2c  [OPTIONS] INPUT [-o OUTPUT]

       re2go [OPTIONS] INPUT [-o OUTPUT]

DESCRIPTION
       re2c is a tool for generating fast lexical analyzers for C, C++ and Go.

SYNTAX
       A  re2c program consists of normal code intermixed with re2c blocks and
       directives.  Each re2c block may  contain  definitions,  configurations
       and  rules.   Definitions are of the form name = regexp;  where name is
       an identifier that consists of letters,  digits  and  underscores,  and
       regexp  is a regular expression.  Regular expressions may contain other
       definitions, but recursion is not  allowed  and  each  name  should  be
       defined  before  used.   Configurations  are  of the form re2c:config =
       value; where config is the configuration descriptor and value can be  a
       number, a string or a special word.  Rules consist of a regular expres-
       sion followed by a semantic action (a block of code enclosed  in  curly
       braces  {  and  }, or a raw one line of code preceded with := and ended
       with a newline that is not followed by a  whitespace).   If  the  input
       matches  the regular expression, the associated semantic action is exe-
       cuted.  If multiple rules match, the longest  match  takes  precedence.
       If  multiple rules match the same string, the earlier rule takes prece-
       dence.  There are two special rules: default rule *  and  EOF  rule  $.
       Default  rule  should  always  be  defined,  it has the lowest priority
       regardless of its place and matches any code unit  (not  necessarily  a
       valid  character,  see  encoding support).  EOF rule matches the end of
       input, it should be defined if the corresponding EOF handling method is
       used.   If  start  conditions are used, rules have more complex syntax.
       All  rules  of  a  single  block  are  compiled  into  a  deterministic
       finite-state  automaton  (DFA)  and encoded in the form of a program in
       the target language.  The generated code interfaces with the outer pro-
       gram  by  the  means  of a few user-defined primitives (see the program
       interface section).  Reusable blocks allow sharing  rules,  definitions
       and configurations between different blocks.

EXAMPLE
   Input file
          // re2c $INPUT -o $OUTPUT -i
          #include <assert.h>                 //
                                              // C/C++ code
          int lex(const char *YYCURSOR)       //
          {
              /*!re2c                         // start of re2c block
              re2c:define:YYCTYPE = char;     // configuration
              re2c:yyfill:enable = 0;         // configuration
              re2c:flags:case-ranges = 1;     // configuration
                                              //
              ident = [a-zA-Z_][a-zA-Z_0-9]*; // named definition
                                              //
              ident { return 0; }             // normal rule
              *     { return 1; }             // default rule
              */
          }                                   //
                                              //
          int main()                          //
          {                                   // C/C++ code
              assert(lex("_Zer0") == 0);      //
              return 0;                       //
          }                                   //


   Output file
          /* Generated by re2c */
          // re2c $INPUT -o $OUTPUT -i
          #include <assert.h>                 //
                                              // C/C++ code
          int lex(const char *YYCURSOR)       //
          {

          {
              char yych;
              yych = *YYCURSOR;
              switch (yych) {
              case 'A' ... 'Z':
              case '_':
              case 'a' ... 'z': goto yy4;
              default: goto yy2;
              }
          yy2:
              ++YYCURSOR;
              { return 1; }
          yy4:
              yych = *++YYCURSOR;
              switch (yych) {
              case '0' ... '9':
              case 'A' ... 'Z':
              case '_':
              case 'a' ... 'z': goto yy4;
              default: goto yy6;
              }
          yy6:
              { return 0; }
          }

          }                                   //
                                              //
          int main()                          //
          {                                   // C/C++ code
              assert(lex("_Zer0") == 0);      //
              return 0;                       //
          }                                   //


OPTIONS
       -? -h --help
              Show help message.

       -1 --single-pass
              Deprecated. Does nothing (single pass is the default now).

       -8 --utf-8
              Generate  a  lexer  that  reads  input  in UTF-8 encoding.  re2c
              assumes that character range is 0 -- 0x10FFFF and character size
              is 1 byte.

       -b --bit-vectors
              Optimize conditional jumps using bit masks. Implies -s.

       -c --conditions --start-conditions
              Enable  support of Flex-like "conditions": multiple interrelated
              lexers within one block. Option --start-conditions is  a  legacy
              alias; use --conditions instead.

       --case-insensitive
              Treat  single-quoted  and double-quoted strings as case-insensi-
              tive.

       --case-inverted
              Invert the meaning of single-quoted and  double-quoted  strings:
              treat  single-quoted strings as case-sensitive and double-quoted
              strings as case-insensitive.

       --case-ranges
              Collapse consecutive cases in a switch statements into  a  range
              of  the  form case low ... high:. This syntax is an extension of
              the C/C++ language, supported by compilers like GCC,  Clang  and
              Tcc.  The main advantage over using single cases is smaller gen-
              erated C code and faster generation time, although for some com-
              pilers  like  Tcc  it also results in smaller binary size.  This
              option doesn't work for the Go backend.

       -e --ecb
              Generate a lexer that reads  input  in  EBCDIC  encoding.   re2c
              assumes that character range is 0 -- 0xFF an character size is 1
              byte.

       --empty-class <match-empty | match-none | error>
              Define  the  way  re2c  treats  empty  character  classes.  With
              match-empty (the default) empty class matches empty input (which
              is  illogical,  but  backwards-compatible).   With``match-none``
              empty  class  always  fails  to  match.   With error empty class
              raises a compilation error.

       --encoding-policy <fail | substitute | ignore>
              Define the way re2c treats Unicode surrogates.  With  fail  re2c
              aborts with an error when a surrogate is encountered.  With sub-
              stitute re2c silently replaces surrogates with  the  error  code
              point  0xFFFD.  With ignore (the default) re2c treats surrogates
              as normal code points. The Unicode standard says that standalone
              surrogates  are  invalid,  but real-world libraries and programs
              behave in different ways.

       -f --storable-state
              Generate a lexer which can store its inner state.  This is  use-
              ful  in  push-model lexers which are stopped by an outer program
              when there is not enough input, and then resumed when more input
              becomes available. In this mode users should additionally define
              YYGETSTATE() and YYSETSTATE(state) macros  and  variables  yych,
              yyaccept and state as part of the lexer state.

       -F --flex-syntax
              Partial  support for Flex syntax: in this mode named definitions
              don't need the equal sign and  the  terminating  semicolon,  and
              when used they must be surrounded by curly braces. Names without
              curly braces are treated as double-quoted strings.

       -g --computed-gotos
              Optimize conditional jumps using  non-standard  "computed  goto"
              extension (which must be supported by the compiler). re2c gener-
              ates jump tables only in complex cases with a lot of conditional
              branches.   Complexity   threshold   can   be   configured  with
              cgoto:threshold configuration.  This  option  implies  -b.  This
              option doesn't work for the Go backend.

       -I PATH
              Add  PATH to the list of locations which are used when searching
              for include files. This option is  useful  in  combination  with
              /*!include:re2c  ...  */  directive.  Re2c looks for FILE in the
              directory of including file and in the  list  of  include  paths
              specified by -I option.

       -i --no-debug-info
              Do  not output #line information. This is useful when the gener-
              ated code is tracked by some version control system or IDE.

       --input <default | custom>
              Specify the API used by the generated  code  to  interface  with
              used-defined  code. Option default is the C API based on pointer
              arithmetic (it is the default for the C backend). Option  custom
              is the generic API (it is the default for the Go backend).

       --input-encoding <ascii | utf8>
              Specify  the  way  re2c  parses regular expressions.  With ascii
              (the default) re2c handles input as ASCII-encoded: any  sequence
              of  code  units  is  a sequence of standalone 1-byte characters.
              With utf8 re2c handles  input  as  UTF8-encoded  and  recognizes
              multibyte characters.

       --lang <c | go>
              Specify  the  output  language. Supported languages are C and Go
              (the default is C).

       --location-format <gnu | msvc>
              Specify location format in messages.   With  gnu  locations  are
              printed as 'filename:line:column: ...'.  With msvc locations are
              printed as 'filename(line,column) ...'.  Default is gnu.

       --no-generation-date
              Suppress date output in the generated file.

       --no-version
              Suppress version output in the generated file.

       -o OUTPUT --output=OUTPUT
              Specify the OUTPUT file.

       -P --posix-captures
              Enable submatch extraction with POSIX-style capturing groups.

       -r --reusable
              Allows reuse of re2c rules with /*!rules:re2c */ and /*!use:re2c
              */  blocks.  Exactly  one rules-block must be present. The rules
              are saved and used by every use-block that  follows,  which  may
              add its own rules and configurations.

       -S --skeleton
              Ignore user-defined interface code and generate a self-contained
              "skeleton" program.  Additionally,  generate  input  files  with
              strings  derived  from  the regular grammar and compressed match
              results that are used  to  verify  "skeleton"  behavior  on  all
              inputs.  This option is useful for finding bugs in optimizations
              and code generation. This option doesn't work for the  Go  back-
              end.

       -s --nested-ifs
              Use  nested if statements instead of switch statements in condi-
              tional jumps. This usually results in more efficient  code  with
              non-optimizing compilers.

       -T --tags
              Enable submatch extraction with tags.

       -t HEADER --type-header=HEADER
              Generate  a HEADER file that contains enum with condition names.
              Requires -c option.

       -u --unicode
              Generate a lexer that reads UTF32-encoded  input.  Re2c  assumes
              that  character  range  is 0 -- 0x10FFFF and character size is 4
              bytes. This option implies -s.

       -V --vernum
              Show version information in MMmmpp format (major, minor, patch).

       --verbose
              Output a short message in case of success.

       -v --version
              Show version information.

       -w --wide-chars
              Generate a lexer that reads  UCS2-encoded  input.  Re2c  assumes
              that  character  range  is  0  -- 0xFFFF and character size is 2
              bytes. This option implies -s.

       -x --utf-16
              Generate a lexer that reads UTF16-encoded  input.  Re2c  assumes
              that  character  range  is 0 -- 0x10FFFF and character size is 2
              bytes. This option implies -s.

   Debug options
       -D --emit-dot
              Instead of normal output generate lexer graph  in  .dot  format.
              The  output  can  be  converted  to  an  image  with the help of
              Graphviz (e.g. something like dot -Tpng -odfa.png dfa.dot).

       -d --debug-output
              Emit YYDEBUG in the generated code.  YYDEBUG should  be  defined
              by  the user in the form of a void function with two parameters:
              state (lexer state or -1) and symbol (current  input  symbol  of
              type YYCTYPE).

       --dump-adfa
              Debug option: output DFA after tunneling (in .dot format).

       --dump-cfg
              Debug  option:  output  control  flow graph of tag variables (in
              .dot format).

       --dump-closure-stats
              Debug option: output statistics on the number of states in  clo-
              sure.

       --dump-dfa-det
              Debug  option:  output DFA immediately after determinization (in
              .dot format).

       --dump-dfa-min
              Debug option: output DFA after minimization (in .dot format).

       --dump-dfa-tagopt
              Debug option: output DFA after tag optimizations (in  .dot  for-
              mat).

       --dump-dfa-tree
              Debug  option:  output DFA under construction with states repre-
              sented as tag history trees (in .dot format).

       --dump-dfa-raw
              Debug  option:  output  DFA  under  construction  with  expanded
              state-sets (in .dot format).

       --dump-interf
              Debug  option:  output  interference  table produced by liveness
              analysis of tag variables.

       --dump-nfa
              Debug option: output NFA (in .dot format).

   Internal options
       --dfa-minimization <moore | table>
              Internal option: DFA minimization algorithm used  by  re2c.  The
              moore option is the Moore algorithm (it is the default). The ta-
              ble option is the "table  filling"  algorithm.  Both  algorithms
              should produce the same DFA up to states relabeling; table fill-
              ing is simpler and much slower and serves as a reference  imple-
              mentation.

       --eager-skip
              Internal  option:  make  the  generated  lexer advance the input
              position eagerly -- immediately after reading the input  symbol.
              This  changes  the  default  behavior when the input position is
              advanced lazily -- after transition  to  the  next  state.  This
              option is implied by --no-lookahead.

       --no-lookahead
              Internal  option:  use  TDFA(0) instead of TDFA(1).  This option
              has effect only with --tags or --posix-captures options.

       --no-optimize-tags
              Internal optionL: suppress optimization of tag variables (useful
              for debugging).

       --posix-closure <gor1 | gtop>
              Internal  option:  specify  shortest-path algorithm used for the
              construction of epsilon-closure with POSIX disambiguation seman-
              tics:  gor1  (the default) stands for Goldberg-Radzik algorithm,
              and gtop stands for "global topological order" algorithm.

       --posix-prectable <complex | naive>
              Internal option: specify the algorithm  used  to  compute  POSIX
              precedence  table. The complex algorithm computes precedence ta-
              ble in one traversal of tag history tree and has quadratic  com-
              plexity  in  the  number  of TNFA states; it is the default. The
              naive algorithm has worst-case cubic complexity in the number of
              TNFA  states,  but  it  is  much simpler than complex and may be
              slightly faster in non-pathological cases.

       --stadfa
              Internal option: use staDFA algorithm for  submatch  extraction.
              The  main  difference with TDFA is that tag operations in staDFA
              are placed in states, not on transitions.

   Warnings
       -W     Turn on all warnings.

       -Werror
              Turn warnings into errors. Note that this option  alone  doesn't
              turn  on  any warnings; it only affects those warnings that have
              been turned on so far or will be turned on later.

       -W<warning>
              Turn on warning.

       -Wno-<warning>
              Turn off warning.

       -Werror-<warning>
              Turn on warning and treat it as an error (this implies  -W<warn-
              ing>).

       -Wno-error-<warning>
              Don't  treat  this  particular warning as an error. This doesn't
              turn off the warning itself.

       -Wcondition-order
              Warn if the generated program makes implicit  assumptions  about
              condition numbering. One should use either the -t, --type-header
              option or the /*!types:re2c*/ directive to generate a mapping of
              condition names to numbers and then use the autogenerated condi-
              tion names.

       -Wempty-character-class
              Warn if a regular expression contains an empty character  class.
              Trying  to  match  an  empty  character class makes no sense: it
              should always fail.  However, for backwards  compatibility  rea-
              sons  re2c  allows  empty  character  classes and treats them as
              empty strings.  Use  the  --empty-class  option  to  change  the
              default behavior.

       -Wmatch-empty-string
              Warn  if  a  rule is nullable (matches an empty string).  If the
              lexer runs in a loop and the empty match is  unintentional,  the
              lexer may unexpectedly hang in an infinite loop.

       -Wswapped-range
              Warn  if  the  lower  bound of a range is greater than its upper
              bound. The default  behavior  is  to  silently  swap  the  range
              bounds.

       -Wundefined-control-flow
              Warn  if  some input strings cause undefined control flow in the
              lexer (the faulty patterns are reported). This is the most  dan-
              gerous and most common mistake. It can be easily fixed by adding
              the default rule * which has the lowest  priority,  matches  any
              code unit, and consumes exactly one code unit.

       -Wunreachable-rules
              Warn about rules that are shadowed by other rules and will never
              match.

       -Wuseless-escape
              Warn if a symbol is escaped when it shouldn't be.   By  default,
              re2c  silently  ignores such escapes, but this may as well indi-
              cate a typo or an error in the escape sequence.

       -Wnondeterministic-tags
              Warn if a tag has n-th degree  of  nondeterminism,  where  n  is
              greater than 1.

       -Wsentinel-in-midrule
              Warn  if  the sentinel symbol occurs in the middle of a rule ---
              this may cause reads past the end of buffer, crashes  or  memory
              corruption in the generated lexer. This warning is only applica-
              ble if the sentinel method of checking for the end of  input  is
              used.   It  is set to an error if re2c:sentinel configuration is
              used.

PROGRAM INTERFACE
       Re2c has a flexible interface that gives the user both the freedom  and
       the  responsibility to define how the generated code interacts with the
       outer program.  There are two major options:

       o Pointer API.  It is also called "default API", since it was  histori-
         cally  the  first,  and for a long time the only one.  This is a more
         restricted API based  on  C  pointer  arithmetics.   It  consists  of
         pointer-like  primitives YYCURSOR, YYMARKER, YYCTXMARKER and YYLIMIT,
         which are normally defined as pointers of type YYCTYPE*.  Pointer API
         is  enabled  by default for the C backend, and it cannot be used with
         other backends that do not have pointer arithmetics.



       o Generic API.  This is a less restricted  API  that  does  not  assume
         pointer   semantics.   It  consists  of  primitives  YYPEEK,  YYSKIP,
         YYBACKUP, YYBACKUPCTX, YYSTAGP, YYSTAGN, YYMTAGP, YYMTAGN, YYRESTORE,
         YYRESTORECTX,  YYRESTORETAG,  YYSHIFT,  YYSHIFTSTAG,  YYSHIFTMTAG and
         YYLESSTHAN.  For the C backend generic API is  enabled  with  --input
         custom option or re2c:flags:input = custom; configuration; for the Go
         backend it is enabled by default.  Generic API was added  in  version
         0.14.   It is intentionally designed to give the user as much freedom
         as possible in redefining the input model and the semantics  of  dif-
         ferent  actions  performed  by the generated code. As an example, one
         can override YYPEEK to check for the end of input before reading  the
         input character, or do some logging, etc.

       Generic API has two styles:

       o Function-like.   This  style  is  enabled with re2c:api:style = func-
         tions; configuration, and it is the default for C  backend.  In  this
         style  API  primitives  should be defined as functions or macros with
         parentheses, accepting the necessary arguments. For example, in C the
         default pointer API can be defined in function-like style generic API
         as follows:

            #define  YYPEEK()                 *YYCURSOR
            #define  YYSKIP()                 ++YYCURSOR
            #define  YYBACKUP()               YYMARKER = YYCURSOR
            #define  YYBACKUPCTX()            YYCTXMARKER = YYCURSOR
            #define  YYRESTORE()              YYCURSOR = YYMARKER
            #define  YYRESTORECTX()           YYCURSOR = YYCTXMARKER
            #define  YYRESTORETAG(tag)        YYCURSOR = tag
            #define  YYLESSTHAN(len)          YYLIMIT - YYCURSOR < len
            #define  YYSTAGP(tag)             tag = YYCURSOR
            #define  YYSTAGN(tag)             tag = NULL
            #define  YYSHIFT(shift)           YYCURSOR += shift
            #define  YYSHIFTSTAG(tag, shift)  tag += shift



       o Free-form.  This style is enabled with  re2c:api:style  =  free-form;
         configuration,  and  it  is the default for Go backend. In this style
         API primitives can be  defined  as  free-form  pieces  of  code,  and
         instead  of  arguments  they  have interpolated variables of the form
         @@{name}, or optionally just @@ if there is only one argument. The @@
         text  is  called  "sigil". It can be redefined to any other text with
         re2c:api:sigil configuration. For example, the  default  pointer  API
         can be defined in free-form style generic API as follows:

            re2c:define:YYPEEK       = "*YYCURSOR";
            re2c:define:YYSKIP       = "++YYCURSOR";
            re2c:define:YYBACKUP     = "YYMARKER = YYCURSOR";
            re2c:define:YYBACKUPCTX  = "YYCTXMARKER = YYCURSOR";
            re2c:define:YYRESTORE    = "YYCURSOR = YYMARKER";
            re2c:define:YYRESTORECTX = "YYCURSOR = YYCTXMARKER";
            re2c:define:YYRESTORETAG = "YYCURSOR = ${tag}";
            re2c:define:YYLESSTHAN   = "YYLIMIT - YYCURSOR < @@{len}";
            re2c:define:YYSTAGP      = "@@{tag} = YYCURSOR";
            re2c:define:YYSTAGN      = "@@{tag} = NULL";
            re2c:define:YYSHIFT      = "YYCURSOR += @@{shift}";
            re2c:define:YYSHIFTSTAG  = "@@{tag} += @@{shift}";

   API primitives
       Here is a list of API primitives that may be used by the generated code
       in order to interface with the outer  program.   Which  primitives  are
       needed depends on multiple factors, including the complexity of regular
       expressions, input representation, buffering, the use of  various  fea-
       tures and so on.  All the necessary primitives should be defined by the
       user in the form of macros, functions, variables, free-form  pieces  of
       code  or any other suitable form.  Re2c does not (and cannot) check the
       definitions, so if anything is missing or defined incorrectly the  gen-
       erated code will not compile.

       YYCTYPE
              The  type  of  the  input  characters  (code units).  For ASCII,
              EBCDIC and UTF-8 encodings it should be 1-byte unsigned integer.
              For  UTF-16  or  UCS-2 it should be 2-byte unsigned integer. For
              UTF-32 it should be 4-byte unsigned integer.

       YYCURSOR
              A pointer-like l-value that stores the  current  input  position
              (usually  a pointer of type YYCTYPE*). Initially YYCURSOR should
              point to the first input character. It is advanced by the gener-
              ated  code.   When  a  rule  matches, YYCURSOR points to the one
              after the last matched character. It is used only in the default
              C API.

       YYLIMIT
              A  pointer-like  r-value  that  stores the end of input position
              (usually a pointer of type YYCTYPE*). Initially  YYLIMIT  should
              point to the one after the last available input character. It is
              not changed by the generated code. Lexer  compares  YYCURSOR  to
              YYLIMIT  in  order to determine if there is enough input charac-
              ters left.  YYLIMIT is used only in the default C API.

       YYMARKER
              A pointer-like l-value (usually a pointer of type YYCTYPE*) that
              stores  the  position  of the latest matched rule. It is used to
              restores YYCURSOR position if the longer match fails  and  lexer
              needs  to  rollback.   Initialization is not needed. YYMARKER is
              used only in the default C API.

       YYCTXMARKER
              A pointer-like l-value that stores the position of the  trailing
              context  (usually a pointer of type YYCTYPE*). No initialization
              is needed.  It is used only in the default C API, and only  with
              the lookahead operator /.

       YYFILL API  primitive  with one argument len.  The meaning of YYFILL is
              to provide at least len more input characters or  fail.  If  EOF
              rule  is  used, YYFILL should always return to the calling func-
              tion; the return value should be zero on success and non-zero on
              failure. If EOF rule is not used, YYFILL return value is ignored
              and it should not return on failure. Maximal  value  of  len  is
              YYMAXFILL,  which can be generated with /*!max:re2c*/ directive.
              The  definition  of  YYFILL  can  be  either  function-like   or
              free-form  depending  on  the  API style (see re2c:api:style and
              re2c:define:YYFILL:naked).

       YYMAXFILL
              An integral constant equal to the  maximal value of YYFILL argu-
              ment.  It can be generated with /*!max:re2c*/ directive.

       YYLESSTHAN
              A  generic  API  primitive  with one argument len.  It should be
              defined as an r-value of boolean type that equals  true  if  and
              only if there is less than len input characters left.  The defi-
              nition can be either function-like or free-form depending on the
              API style (see re2c:api:style).

       YYPEEK A generic API primitive with no arguments.  It should be defined
              as an r-value of type YYCTYPE that is equal to the character  at
              the  current  input position. The definition can be either func-
              tion-like  or  free-form  depending  on  the  API   style   (see
              re2c:api:style).

       YYSKIP A  generic  API  primitive  with  no  arguments.  The meaning of
              YYSKIP is to advance the current input position by  one  charac-
              ter.  The  definition  can  be either function-like or free-form
              depending on the API style (see re2c:api:style).

       YYBACKUP
              A generic API primitive  with  no  arguments.   The  meaning  of
              YYBACKUP  is  to save the current input position, which is later
              restored with YYRESTORE.  The definition should be either  func-
              tion-like   or   free-form  depending  on  the  API  style  (see
              re2c:api:style).

       YYRESTORE
              A generic API primitive with no arguments.  The meaning of YYRE-
              STORE  is  to  restore  the  current input position to the value
              saved by  YYBACKUP.   The  definition  should  be  either  func-
              tion-like   or   free-form  depending  on  the  API  style  (see
              re2c:api:style).

       YYBACKUPCTX
              A generic API primitive with zero  arguments.   The  meaning  of
              YYBACKUPCTX  is  to save the current input position as the posi-
              tion of the trailing context, which is later restored  by  YYRE-
              STORECTX.   The  definition  should  be  either function-like or
              free-form depending on the API style (see re2c:api:style).

       YYRESTORECTX
              A generic API primitive with no arguments.  The meaning of YYRE-
              STORECTX  is to restore the trailing context position saved with
              YYBACKUPCTX.  The definition should be either  function-like  or
              free-form depending on the API style (see re2c:api:style).

       YYRESTORETAG
              A  generic  API primitive with one argument tag.  The meaning of
              YYRESTORETAG is to restore the trailing context position to  the
              value  of tag.  The definition should be either function-like or
              free-form depending on the API style (see re2c:api:style).

       YYSTAGP
              A generic API primitive with one argument tag.  The  meaning  of
              YYSTAGP  is to set tag value to the current input position.  The
              definition should be either function-like or free-form depending
              on the API style (see re2c:api:style).

       YYSTAGN
              A  generic  API primitive with one argument tag.  The meaning of
              YYSTAGP is to set tag value to null (or some default value). The
              definition should be either function-like or free-form depending
              on the API style (see re2c:api:style).

       YYMTAGP
              A generic API primitive with one argument tag.  The  meaning  of
              YYMTAGP is to append the current position to the history of tag.
              The definition  should  be  either  function-like  or  free-form
              depending on the API style (see re2c:api:style).

       YYMTAGN
              A  generic  API primitive with one argument tag.  The meaning of
              YYMTAGN is to append null (or some other default) value  to  the
              history  of  tag.  The definition can be either function-like or
              free-form depending on the API style (see re2c:api:style).

       YYSHIFT
              A generic API primitive with one argument shift.  The meaning of
              YYSHIFT  is to shift the current input position by shift charac-
              ters (the shift value may be negative). The  definition  can  be
              either  function-like  or  free-form  depending on the API style
              (see re2c:api:style).

       YYSHIFTSTAG
              A generic  API primitive with two arguments, tag and shift.  The
              meaning  of YYSHIFTSTAG is to shift tag by shift characters (the
              shift value may be negative).   The  definition  can  be  either
              function-like  or  free-form  depending  on  the  API style (see
              re2c:api:style).

       YYSHIFTMTAG
              A generic API primitive with two arguments, tag and shift.   The
              meaning  of YYSHIFTMTAG is to shift the latest value in the his-
              tory of tag by shift characters (the shift value  may  be  nega-
              tive).    The  definition  should  be  either  function-like  or
              free-form depending on the API style (see re2c:api:style).

       YYMAXNMATCH
              An integral constant equal to the maximal number of  POSIX  cap-
              turing   groups  in  a  rule.  It  is  generated  with  /*!maxn-
              match:re2c*/ directive.

       YYCONDTYPE
              The type of the condition enum.  It should be  generated  either
              with /*!types:re2c*/ directive or -t --type-header option.

       YYGETCONDITION
              An  API  primitive with zero arguments.  It should be defined as
              an r-value of type YYCONDTYPE that is equal to the current  con-
              dition identifier. The definition can be either function-like or
              free-form depending on the API  style  (see  re2c:api:style  and
              re2c:define:YYGETCONDITION:naked).

       YYSETCONDITION
              An  API primitive with one argument cond.  The meaning of YYSET-
              CONDITION is to set the current condition  identifier  to  cond.
              The  definition  should  be  either  function-like  or free-form
              depending   on   the   API   style   (see   re2c:api:style   and
              re2c:define:YYSETCONDITION@cond).

       YYGETSTATE
              An  API  primitive with zero arguments.  It should be defined as
              an r-value of integer type that is equal to  the  current  lexer
              state. Should be initialized to -1. The definition can be either
              function-like or free-form  depending  on  the  API  style  (see
              re2c:api:style and re2c:define:YYGETSTATE:naked).

       YYSETSTATE
              An API primitive with one argument state.  The meaning of YYSET-
              STATE is to set the current lexer state to state.   The  defini-
              tion  should  be  either function-like or free-form depending on
              the  API  style  (see  re2c:api:style   and   re2c:define:YYSET-
              STATE@state).

       YYDEBUG
              A  debug  API  primitive  with  two arguments. It can be used to
              debug the generated code (with -d --debug-output option).  YYDE-
              BUG  should  return  no  value  and  accept two arguments: state
              (either a DFA state index or -1) and symbol (the  current  input
              symbol).

       yych   An l-value of type YYCTYPE that stores the current input charac-
              ter.  User definition is necessary only with -f --storable-state
              option.

       yyaccept
              An  l-value  of unsigned integral type that stores the number of
              the latest matched rule.  User definition is necessary only with
              -f --storable-state option.

       yynmatch
              An  l-value  of unsigned integral type that stores the number of
              POSIX capturing groups in the matched rule.  Used only  with  -P
              --posix-captures option.

       yypmatch
              An array of l-values that are used to hold the tag values corre-
              sponding to the capturing  parentheses  in  the  matching  rule.
              Array  length must be at least yynmatch * 2 (usually YYMAXNMATCH
              * 2 is a good  choice).   Used  only  with  -P  --posix-captures
              option.

   Directives
       Below  is the list of all directives provided by re2c (in no particular
       order).  More information on each directive can be found in the related
       sections.

       /*!re2c ... */
              A standard re2c block.

       %{ ... %}
              A standard re2c block in -F --flex-support mode.

       /*!rules:re2c ... */
              A reusable re2c block (requires -r --reuse option).

       /*!use:re2c ... */
              A   block   that  reuses  previous  rules-block  specified  with
              /*!rules:re2c ... */ (requires -r --reuse option).

       /*!ignore:re2c ... */
              A block which contents are ignored and cut off from  the  output
              file.

       /*!max:re2c*/
              This  directive  is  substituted  with  the  macro-definition of
              YYMAXFILL.

       /*!maxnmatch:re2c*/
              This directive  is  substituted  with  the  macro-definition  of
              YYMAXNMATCH (requires -P --posix-captures option).

       /*!getstate:re2c*/
              This directive is substituted with conditional dispatch on lexer
              state (requires -f --storable-state option).

       /*!types:re2c ... */
              This directive is substituted with the definition  of  condition
              enum (requires -c --conditions option).

       /*!stags:re2c ... */, /*!mtags:re2c ... */
              These  directives  allow one to specify a template piece of code
              that is expanded for  each  s-tag/m-tag  variable  generated  by
              re2c. This block has two optional configurations: format = "@@";
              (specifies the template where @@ is substituted with the name of
              each  tag variable), and separator = ""; (specifies the piece of
              code used to join the generated pieces for different  tag  vari-
              ables).

       /*!include:re2c FILE */
              This  directive allows one to include FILE (in the same sense as
              #include directive in C/C++).

       /*!header:re2c:on*/
              This directive marks the start of header file. Everything  after
              it  and  up  to  the following /*!header:re2c:off*/ directive is
              processed by re2c and written to the header file specified  with
              -t --type-header option.

       /*!header:re2c:off*/
              This  directive  marks  the  end  of  header  file  started with
              /*!header:re2c:on*/.

   Configurations
       re2c:flags:t, re2c:flags:type-header
              Specify the name of the generated header file  relative  to  the
              directory  of  the  output file. (Same as -t, --type-header com-
              mand-line option except that the filepath is relative.)

       re2c:flags:input
              Same as --input command-line option.

       re2c:api:style
              Allows one to specify the style of generic API. Possible  values
              are  functions  and free-form. With functions style (the default
              for the C backend) API primitives  behave  like  functions,  and
              re2c  generates parentheses with an argument list after the name
              of each primitive.  With free-form style (the default for the Go
              backend) re2c treats API definitions as interpolated strings and
              substitutes argument placeholders with the actual argument  val-
              ues.   This  option  can be overridden by options for individual
              API primitives, e.g. re2c:define:YYFILL:naked for YYFILL.

       re2c:api:sigil
              Allows one to specify the "sigil" symbol  (or  string)  that  is
              used  to  recognize  argument placeholders in the definitions of
              generic API primitives.  The default value is @@.   Placeholders
              start with sigil, followed by the argument name in curly braces.
              For example, if sigil is set to $, then placeholders  will  have
              the  form  ${name}. Single-argument APIs may use shorthand nota-
              tion without the name in braces. This option can  be  overridden
              by    options    for    individual    API    primitives,    e.g.
              re2c:define:YYFILL@len for YYFILL.

       re2c:define:YYCTYPE
              Defines YYCTYPE (see the user interface section).

       re2c:define:YYCURSOR
              Defines C API primitive YYCURSOR (see the  user  interface  sec-
              tion).

       re2c:define:YYLIMIT
              Defines  C  API  primitive  YYLIMIT (see the user interface sec-
              tion).

       re2c:define:YYMARKER
              Defines C API primitive YYMARKER (see the  user  interface  sec-
              tion).

       re2c:define:YYCTXMARKER
              Defines C API primitive YYCTXMARKER (see the user interface sec-
              tion).

       re2c:define:YYFILL
              Defines API primitive YYFILL (see the user interface section).

       re2c:define:YYFILL@len
              Specifies the sigil used for  argument  substitution  in  YYFILL
              definition.   Defaults   to  @@.   Overrides  the  more  generic
              re2c:api:sigil configuration.

       re2c:define:YYFILL:naked
              Allows one to override re2c:api:style for YYFILL.  Value 0  cor-
              responds to free-form API style.

       re2c:yyfill:enable
              Defaults  to 1 (YYFILL is enabled). Set this to zero to suppress
              the generation of YYFILL. Use warnings (-W option) and re2c:sen-
              tinel  configuration  to  verify that the generated lexer cannot
              read past the end of input, as this might introduce severe secu-
              rity issues to your programs.

       re2c:yyfill:parameter
              Controls  the  argument  in  the parentheses that follow YYFILL.
              Defaults to 1, which means that the argument  is  generated.  If
              zero,   the   argument   is  omitted.  Can  be  overridden  with
              re2c:define:YYFILL:naked or re2c:api:style.

       re2c:eof
              Specifies the sentinel symbol used with EOF rule $ to check  for
              the end of input in the generated lexer. The default value is -1
              (EOF rule is not used). Other possible values include all  valid
              code units. Only decimal numbers are recognized.

       re2c:sentinel
              Specifies  the  sentinel symbol used with the sentinel method of
              checking for the end of input in the generated lexer  (the  case
              when  bounds  checking  is disabled with re2c:yyfill:enable = 0;
              and EOF rule $ is not used). This configuration does not  affect
              code  generation. It is used by re2c to verify that the sentinel
              symbol is not allowed in the middle of  the  rule,  and  prevent
              possible  reads  past  the end of buffer in the generated lexer.
              The default value is -1 (re2c assumes that the  sentinel  symbol
              is  0,  which  is  the  most common case). Other possible values
              include all valid code units. Only decimal  numbers  are  recog-
              nized.

       re2c:define:YYLESSTHAN
              Defines generic API primitive YYLESSTHAN (see the user interface
              section).

       re2c:yyfill:check
              Setting this to zero allows to suppress the generation of YYFILL
              check  (YYLESSTHAN in generic API of YYLIMIT-based comparison in
              default C API). This configuration is useful when the  necessary
              input is always available. it defaults to 1 (the check is gener-
              ated).

       re2c:label:yyFillLabel
              Allows one to change the prefix of YYFILL labels (used with  EOF
              rule or with storable states).

       re2c:define:YYPEEK
              Defines  generic  API  primitive  YYPEEK (see the user interface
              section).

       re2c:define:YYSKIP
              Defines generic API primitive YYSKIP  (see  the  user  interface
              section).

       re2c:define:YYBACKUP
              Defines  generic  API primitive YYBACKUP (see the user interface
              section).

       re2c:define:YYBACKUPCTX
              Defines generic API primitive YYBACKUPCTX (see the  user  inter-
              face section).

       re2c:define:YYRESTORE
              Defines  generic API primitive YYRESTORE (see the user interface
              section).

       re2c:define:YYRESTORECTX
              Defines generic API primitive YYRESTORECTX (see the user  inter-
              face section).

       re2c:define:YYRESTORETAG
              Defines  generic API primitive YYRESTORETAG (see the user inter-
              face section).

       re2c:define:YYSHIFT
              Defines generic API primitive YYSHIFT (see  the  user  interface
              section).

       re2c:define:YYSHIFTMTAG
              Defines  generic  API primitive YYSHIFTMTAG (see the user inter-
              face section).

       re2c:define:YYSHIFTSTAG
              Defines generic API primitive YYSHIFTSTAG (see the  user  inter-
              face section).

       re2c:define:YYSTAGN
              Defines  generic  API  primitive YYSTAGN (see the user interface
              section).

       re2c:define:YYSTAGP
              Defines generic API primitive YYSTAGP (see  the  user  interface
              section).

       re2c:define:YYMTAGN
              Defines  generic  API  primitive YYMTAGN (see the user interface
              section).

       re2c:define:YYMTAGP
              Defines generic API primitive YYMTAGP (see  the  user  interface
              section).

       re2c:flags:T, re2c:flags:tags
              Same as -T --tags command-line option.

       re2c:flags:P, re2c:flags:posix-captures
              Same as -P --posix-captures command-line option.

       re2c:tags:expression
              Allows  one  to  customize the way re2c addresses tag variables.
              By default re2c generates expressions of the form  yyt<N>.  This
              might  be inconvenient, for example if tag variables are defined
              as fields in a struct. Re2c recognizes placeholder of  the  form
              @@{tag}  or  @@ and replaces it with the actual tag name.  Sigil
              @@ can be  redefined  with  re2c:api:sigil  configuration.   For
              example,  setting  re2c:tags:expression  =  "p->@@";  results in
              expressions of the form p->yyt<N> in the generated code.

       re2c:tags:prefix
              Allows one to override the prefix of tag variables (defaults  to
              yyt).

       re2c:flags:lookahead
              Same as inverted --no-lookahead command-line option.

       re2c:flags:optimize-tags
              Same as inverted --no-optimize-tags command-line option.

       re2c:define:YYCONDTYPE
              Defines YYCONDTYPE (see the user interface section).

       re2c:define:YYGETCONDITION
              Defines  API  primitive  YYGETCONDITION  (see the user interface
              section).

       re2c:define:YYGETCONDITION:naked
              Allows one to override re2c:api:style for YYGETCONDITION.  Value
              0 corresponds to free-form API style.

       re2c:define:YYSETCONDITION
              Defines  API  primitive  YYSETCONDITION  (see the user interface
              section).

       re2c:define:YYSETCONDITION@cond
              Specifies the sigil used for argument substitution in  YYSETCON-
              DITION  definition. The default value is @@.  Overrides the more
              generic re2c:api:sigil configuration.

       re2c:define:YYSETCONDITION:naked
              Allows one to override re2c:api:style for YYSETCONDITION.  Value
              0 corresponds to free-form API style.

       re2c:cond:goto
              Allows one to customize the goto statements used with the short-
              cut :=> rules in conditions. The  default  value  is  goto  @@;.
              Placeholders   are   substituted   with   condition   name  (see
              re2c:api;sigil and re2c:cond:goto@cond).

       re2c:cond:goto@cond
              Specifies  the  sigil  used   for   argument   substitution   in
              re2c:cond:goto  definition.  The default value is @@.  Overrides
              the more generic re2c:api:sigil configuration.

       re2c:cond:divider
              Defines the divider for condition blocks.  The default value  is
              /*  ***********************************  */.   Placeholders  are
              substituted  with  condition  name   (see   re2c:api;sigil   and
              re2c:cond:divider@cond).

       re2c:cond:divider@cond
              Specifies   the   sigil   used   for  argument  substitution  in
              re2c:cond:divider definition. The default value  is  @@.   Over-
              rides the more generic re2c:api:sigil configuration.

       re2c:condprefix
              Specifies  the  prefix  used  for condition labels.  The default
              value is yyc_.

       re2c:condenumprefix
              Specifies  the  prefix  used  for  condition  identifiers.   The
              default value is yyc.

       re2c:define:YYGETSTATE
              Defines  API  primitive  YYGETSTATE (see the user interface sec-
              tion).

       re2c:define:YYGETSTATE:naked
              Allows one to override re2c:api:style for YYGETSTATE.   Value  0
              corresponds to free-form API style.

       re2c:define:YYSETSTATE
              Defines  API  primitive  YYSETSTATE (see the user interface sec-
              tion).

       re2c:define:YYSETSTATE@state
              Specifies the sigil used for argument substitution in YYSETSTATE
              definition. The default value is @@.  Overrides the more generic
              re2c:api:sigil configuration.

       re2c:define:YYSETSTATE:naked
              Allows one to override re2c:api:style for YYSETSTATE.   Value  0
              corresponds to free-form API style.

       re2c:state:abort
              If  set  to  a  positive  integer value, changes the form of the
              YYGETSTATE switch: instead of using default case to jump to  the
              beginning of the lexer block, a -1 case is used, and the default
              case aborts the program.

       re2c:state:nextlabel
              With storable states, allows to control if the YYGETSTATE  block
              is  followed by a yyNext label (the default value is zero, which
              corresponds to no label). Instead of using yyNext it is possible
              to  use  re2c:startlabel  to  force the generation of a specific
              start label.  Instead of using labels it is  often  more  conve-
              nient to generate YYGETSTATE code using /*!getstate:re2c*/.

       re2c:label:yyNext
              Allows one to change the name of the yyNext label.

       re2c:startlabel
              Controls the generation of start label for the next lexer block.
              The default value is zero, which means that the start  label  is
              generated only if it is used. An integer value greater than zero
              forces the generation of start label even if it is unused by the
              lexer.  A  string  value  also forces start label generation and
              sets the label name to the specified string.  This configuration
              applies  only  to  the current block (it is reset to default for
              the next block).

       re2c:flags:s, re2c:flags:nested-ifs
              Same as -s --nested-ifs command-line option.

       re2c:flags:b, re2c:flags:bit-vectors
              Same as -b --bit-vectors command-line option.

       re2c:variable:yybm
              Overrides the name of the yybm variable.

       re2c:yybm:hex
              Defaults to zero (a decimal bitmap table is generated).  If  set
              to nonzero, a hexadecimal table is generated.

       re2c:flags:g, re2c:flags:computed-gotos
              Same as -g --computed-gotos command-line option.

       re2c:cgoto:threshold
              With  -g  --computed-gotos  option this value specifies the com-
              plexity threshold that triggers the generation  of  jump  tables
              instead  of  nested if statements and bitmaps. The default value
              is 9.

       re2c:flags:case-ranges
              Same as --case-ranges command-line option.

       re2c:flags:e, re2c:flags:ecb
              Same as -e --ecb command-line option.

       re2c:flags:8, re2c:flags:utf-8
              Same as -8 --utf-8 command-line option.

       re2c:flags:w, re2c:flags:wide-chars
              Same as -w --wide-chars command-line option.

       re2c:flags:x, re2c:flags:utf-16
              Same as -x --utf-16 command-line option.

       re2c:flags:u, re2c:flags:unicode
              Same as -u --unicode command-line option.

       re2c:flags:encoding-policy
              Same as --encoding-policy command-line option.

       re2c:flags:empty-class
              Same as --empty-class command-line option.

       re2c:flags:case-insensitive
              Same as --case-insensitive command-line option.

       re2c:flags:case-inverted
              Same as --case-inverted command-line option.

       re2c:flags:i, re2c:flags:no-debug-info
              Same as -i --no-debug-info command-line option.

       re2c:indent:string
              Specifies the string to use for indentation.  The default  value
              is  "\t".   Indent string should contain only whitespace charac-
              ters.  To disable indentation entirely, set  this  configuration
              to empty string "".

       re2c:indent:top
              Specifies the minimum amount of indentation to use.  The default
              value is zero.  The value should be a non-negative integer  num-
              ber.

       re2c:labelprefix
              Allows  one  to  change  the  prefix  of  DFA state labels.  The
              default value is yy.

       re2c:yych:emit
              Set this to zero to suppress the generation of yych  definition.
              Defaults to 1 (the definition is generated).

       re2c:variable:yych
              Overrides the name of the yych variable.

       re2c:yych:conversion
              If  set  to nonzero, re2c automatically generates a cast to YYC-
              TYPE every time yych is read. Defaults to zero (no cast).

       re2c:variable:yyaccept
              Overrides the name of the yyaccept variable.

       re2c:variable:yytarget
              Overrides the name of the yytarget variable.

       re2c:variable:yystable
              Deprecated.

       re2c:variable:yyctable
              When both -c --conditions and -g  --computed-gotos  are  active,
              re2c  will use this variable to generate a static jump table for
              YYGETCONDITION.

       re2c:define:YYDEBUG
              Defines YYDEBUG (see the user interface section).

       re2c:flags:d, re2c:flags:debug-output
              Same as -d --debug-output command-line option.

       re2c:flags:dfa-minimization
              Same as --dfa-minimization command-line option.

       re2c:flags:eager-skip
              Same as --eager-skip command-line option.

REGULAR EXPRESSIONS
       re2c uses the following syntax for regular expressions:

       o "foo" case-sensitive string literal

       o 'foo' case-insensitive string literal

       o [a-xyz], [^a-xyz] character class (possibly negated)

       o . any character except newline

       o R \ S difference of character classes R and S

       o R* zero or more occurrences of R

       o R+ one or more occurrences of R

       o R? optional R

       o R{n} repetition of R exactly n times

       o R{n,} repetition of R at least n times

       o R{n,m} repetition of R from n to m times

       o (R) just R; parentheses  are  used  to  override  precedence  or  for
         POSIX-style submatch

       o R S concatenation: R followed by S

       o R | S alternative: R or S

       o R / S lookahead: R followed by S, but S is not consumed

       o name the regular expression defined as name (or literal string "name"
         in Flex compatibility mode)

       o {name} the regular expression defined as name in  Flex  compatibility
         mode

       o @stag  an s-tag: saves the last input position at which @stag matches
         in a variable named stag

       o #mtag an m-tag: saves all input positions at which #mtag matches in a
         variable named mtag

       Character  classes and string literals may contain the following escape
       sequences: \a, \b, \f, \n, \r, \t, \v, \\, octal escapes \ooo and hexa-
       decimal escapes \xhh, \uhhhh and \Uhhhhhhhh.

EOF HANDLING
       Re2c  provides a number of ways to handle end-of-input situation. Which
       way to use depends on the complexity of  regular  expressions,  perfor-
       mance  considerations,  the  need for input buffering and various other
       factors. EOF handling is probably the most complex part  of  re2c  user
       interface  --- it definitely requires a bit of understanding of how the
       generated lexer works.  But in return is allows the user  to  customize
       lexer  for  a particular environment and avoid the unnecessary overhead
       of generic methods when a simpler method is sufficient. Roughly  speak-
       ing, there are four main methods:

       o using sentinel symbol (simple and efficient, but limited)

       o bounds checking with padding (generic, but complex)

       o EOF  rule:  a  combination  of  sentinel  symbol  and bounds checking
         (generic and simple, can be more or less efficient than bounds check-
         ing with padding depending on the grammar)

       o using generic API (user-defined, so may be incorrect ;])

   Using sentinel symbol
       This is the simplest and the most efficient method. It is applicable in
       cases when the input is small enough to fit into  a  continuous  memory
       buffer and there is a natural "sentinel" symbol --- a code unit that is
       not allowed by any of the regular expressions in grammar (except possi-
       bly  as  a  terminating  character).   Sentinel symbol never appears in
       well-formed input, therefore it can be appended at the end of input and
       used  as  a stop signal by the lexer. A good example of such input is a
       null-terminated C-string, provided that the grammar does not allow NULL
       in  the  middle  of lexemes. Sentinel method is very efficient, because
       the lexer does not need to perform any additional checks for the end of
       input  ---  it comes naturally as a part of processing the next charac-
       ter.  It is very important that the sentinel symbol is not  allowed  in
       the  middle of the rule --- otherwise on some inputs the lexer may read
       past the end of buffer and crash or cause memory corruption. Re2c veri-
       fies  this  automatically.   Use re2c:sentinel configuration to specify
       which sentinel symbol is used.

       Below  is  an  example  of   using   sentinel   method.   Configuration
       re2c:yyfill:enable  =  0;  suppresses generation of end-of-input checks
       and YYFILL calls.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>

          // expect a null-terminated string
          static int lex(const char *YYCURSOR)
          {
              int count = 0;
          loop:
              /*!re2c
              re2c:define:YYCTYPE = char;
              re2c:yyfill:enable = 0;

              *      { return -1; }
              [\x00] { return count; }
              [a-z]+ { ++count; goto loop; }
              [ ]+   { goto loop; }

              */
          }

          int main()
          {
              assert(lex("") == 0);
              assert(lex("one two three") == 3);
              assert(lex("f0ur") == -1);
              return 0;
          }


   Bounds checking with padding
       Bounds checking is a generic method: it can  be  used  with  any  input
       grammar.   The  basic  idea  is simple: we need to check for the end of
       input before reading the next input character. However, if  implemented
       in  a straightforward way, this would be quite inefficient: checking on
       each input character would cause a major slowdown. Re2c avoids slowdown
       by  generating checks only in certain key states of the lexer, and let-
       ting it run without checks in-between the key states.  More  precisely,
       re2c  computes  strongly  connected components (SCCs) of the underlying
       DFA (which roughly correspond to  loops),  and  generates  only  a  few
       checks  per  each  SCC (usually just one, but in general enough to make
       the SCC acyclic). The check is of the form (YYLIMIT -  YYCURSOR)  <  n,
       where  n  is  the  maximal length of a simple path in the corresponding
       SCC. If this condiiton is true, the lexer calls YYFILL(n),  which  must
       either  supply  at least n input characters, or do not return. When the
       lexer continues after the check, it is certain that the next n  charac-
       ters can be read safely without checks.

       This approach reduces the number of checks significantly (and makes the
       lexer much faster as a result), but it has a downside. Since the  lexer
       checks  for  multiple  characters at once, it may end up in a situation
       when there are a few remaining input characters (less  than  n)  corre-
       sponding  to  a  short  path  in  the SCC, but the lexer cannot proceed
       because of the check, and YYFILL cannot supply more  character  because
       it is the end of input. To solve this problem, re2c requires that addi-
       tional padding consisting of fake characters is appended at the end  of
       input.  The  length of padding should be YYMAXFILL, which equals to the
       maximum n parameter to YYFILL and  must  be  generated  by  re2c  using
       /*!max:re2c*/  directive.  The  fake characters should not form a valid
       lexeme suffix, otherwise the lexer may be fooled into matching  a  fake
       lexeme. Usually it's a good idea to use NULL characters for padding.

       Below  is  an  example of using bounds checking with padding. Note that
       the grammar rule for single-quoted strings allows arbitrary symbols  in
       the  middle  of lexeme, so there is no natural sentinel in the grammar.
       Strings like "aha\0ha" are perfectly valid, but ill-formed strings like
       "aha\0 are also possible and shouldn't crash the lexer. In this example
       we do not use buffer  refilling,  therefore  YYFILL  definition  simply
       returns  an error. Note that YYFILL will only be called after the lexer
       reaches padding, because only then will the check condition  be  satis-
       fied.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <stdlib.h>
          #include <string.h>

          /*!max:re2c*/

          // expect YYMAXFILL-padded string
          static int lex(const char *str, unsigned int len)
          {
              const char *YYCURSOR = str, *YYLIMIT = str + len + YYMAXFILL;
              int count = 0;

          loop:
              /*!re2c
              re2c:api:style = free-form;
              re2c:define:YYCTYPE = char;
              re2c:define:YYFILL = "return -1;";

              *                           { return -1; }
              [\x00]                      { return YYCURSOR == YYLIMIT ? count : -1; }
              ['] ([^'\\] | [\\][^])* ['] { ++count; goto loop; }
              [ ]+                        { goto loop; }

              */
          }

          // make a copy of the string with YYMAXFILL zeroes at the end
          static void test(const char *str, unsigned int len, int res)
          {
              char *s = (char*) malloc(len + YYMAXFILL);
              memcpy(s, str, len);
              memset(s + len, 0, YYMAXFILL);
              int r = lex(s, len);
              free(s);
              assert(r == res);
          }

          #define TEST(s, r) test(s, sizeof(s) - 1, r)
          int main()
          {
              TEST("", 0);
              TEST("'qu\0tes' 'are' 'fine: \\'' ", 3);
              TEST("'unterminated\\'", -1);
              return 0;
          }


   EOF rule
       EOF  rule $ was introduced in version 1.2. It is a hybrid approach that
       tries to take the best of both worlds: simplicity and efficiency of the
       sentinel method combined with the generality of bounds-checking method.
       The idea is to appoint an arbitrary symbol to be the sentinel, and only
       perform  further  bounds  checking if the sentinel symbol matches (more
       precisely, if the symbol class that contains it matches). The check  is
       of  the  form YYLIMIT <= YYCURSOR.  If this condition is not satisfied,
       then the sentinel is just an ordinary input  character  and  the  lexer
       continues.  Otherwise  this  is  a  real  sentinel, and the lexer calls
       YYFILL(). If YYFILL returns zero, the lexer assumes that  it  has  more
       input  and tries to re-match. Otherwise YYFILL returns non-zero and the
       lexer knows that it has reached the end of input. At this  point  there
       are three possibilities. First, it might have already matched a shorter
       lexeme --- in this case it just rolls back to the last accepting state.
       Second, it might have consumed some characters, but failed to match ---
       in this case it falls back to default rule *. Finally, it might  be  in
       the initial state --- in this (and only this!) case it matches EOF rule
       $.

       Below is an example of using EOF rule. Configuration re2c:yyfill:enable
       = 0; suppresses generation of YYFILL calls (but not the bounds checks).

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>

          // expect a null-terminated string
          static int lex(const char *str, unsigned int len)
          {
              const char *YYCURSOR = str, *YYLIMIT = str + len, *YYMARKER;
              int count = 0;

          loop:
              /*!re2c
              re2c:define:YYCTYPE = char;
              re2c:yyfill:enable = 0;
              re2c:eof = 0;

              *                           { return -1; }
              $                           { return count; }
              ['] ([^'\\] | [\\][^])* ['] { ++count; goto loop; }
              [ ]+                        { goto loop; }

              */
          }

          #define TEST(s, r) assert(lex(s, sizeof(s) - 1) == r)
          int main()
          {
              TEST("", 0);
              TEST("'qu\0tes' 'are' 'fine: \\'' ", 3);
              TEST("'unterminated\\'", -1);
              return 0;
          }


   Using generic API
       Generic  API  can be used with any of the above methods. It also allows
       one to use a user-defined method by placing EOF checks in  one  of  the
       basic  primitives.   Usually  this  is either YYSKIP (the check is per-
       formed when advancing to the next  input  character),  or  YYPEEK  (the
       check  is performed when reading the next input character). The result-
       ing methods are inefficient, as they check  on  each  input  character.
       However,  they can be useful in cases when the input cannot be buffered
       or padded and does not contain a sentinel character  at  the  end.  One
       should  be  cautious  when  using such ad-hoc methods, as it is easy to
       overlook some corner cases and come up with a  method  that  only  par-
       tially  works.  Also  it  should  be  noted  that not everything can be
       expressed via generic API: for example, it is impossible to reimplement
       the way EOF rule works (in particular, it is impossible to re-match the
       character after successful YYFILL).

       Below is an example of using YYSKIP to perform bounds checking  without
       padding.  YYFILL generation is suppressed using re2c:yyfill:enable = 0;
       configuration. Note that if the grammar was more complex,  this  method
       might not work in case when two rules overlap and EOF check fails after
       a shorter lexeme has already been matched (as it happens in  our  exam-
       ple, there are no overlapping rules).

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <stdlib.h>
          #include <string.h>

          // expect a string without terminating null
          static int lex(const char *str, unsigned int len)
          {
              const char *cur = str, *lim = str + len, *mar;
              int count = 0;

          loop:
              /*!re2c
              re2c:yyfill:enable = 0;
              re2c:eof = 0;
              re2c:flags:input = custom;
              re2c:api:style = free-form;
              re2c:define:YYCTYPE    = char;
              re2c:define:YYLESSTHAN = "cur >= lim";
              re2c:define:YYPEEK     = "cur < lim ? *cur : 0";  // fake null
              re2c:define:YYSKIP     = "++cur;";
              re2c:define:YYBACKUP   = "mar = cur;";
              re2c:define:YYRESTORE  = "cur = mar;";

              *                           { return -1; }
              $                           { return count; }
              ['] ([^'\\] | [\\][^])* ['] { ++count; goto loop; }
              [ ]+                        { goto loop; }

              */
          }

          // make a copy of the string without terminating null
          static void test(const char *str, unsigned int len, int res)
          {
              char *s = (char*) malloc(len);
              memcpy(s, str, len);
              int r = lex(s, len);
              free(s);
              assert(r == res);
          }

          #define TEST(s, r) test(s, sizeof(s) - 1, r)
          int main()
          {
              TEST("", 0);
              TEST("'qu\0tes' 'are' 'fine: \\'' ", 3);
              TEST("'unterminated\\'", -1);
              return 0;
          }


BUFFER REFILLING
       The need for buffering arises when the input cannot be mapped in memory
       all at once: either it is too large, or it comes in a streaming fashion
       (like  reading  from a socket). The usual technique in such cases is to
       allocate a fixed-sized memory buffer and process input in  chunks  that
       fit  into  the buffer. When the current chunk is processed, it is moved
       out and new data is moved in. In practice it is somewhat more  complex,
       because  lexer state consists not of a single input position, but a set
       of interrelated posiitons:

       o cursor: the next input character to be read (YYCURSOR in default  API
         or YYSKIP/YYPEEK in generic API)

       o limit: the position after the last available input character (YYLIMIT
         in default API, implicitly handled by YYLESSTHAN in generic API)

       o marker: the position of the most recent match, if  any  (YYMARKER  in
         default API or YYBACKUP/YYRESTORE in generic API)

       o token:  the  start of the current lexeme (implicit in re2c API, as it
         is not needed for the normal lexer operation and can be  defined  and
         updated by the user)

       o context  marker: the position of the trailing context (YYCTXMARKER in
         default API or YYBACKUPCTX/YYRESTORECTX in generic API)

       o tag variables: submatch positions (defined with  /*!stags:re2c*/  and
         /*!mtags:re2c*/  directives  and  YYSTAGP/YYSTAGN/YYMTAGP/YYMTAGN  in
         generic API)

       Not all these are used in every case, but if used, they must be updated
       by  YYFILL.  All  active positions are contained in the segment between
       token and cursor, therefore everything between buffer start  and  token
       can  be  discarded,  the  segment  from token and up to limit should be
       moved to the beginning of buffer, and the free space at the end of buf-
       fer  should be filled with new data.  In order to avoid frequent YYFILL
       calls it is best to fill in as many input characters as possible  (even
       though fewer characters might suffice to resume the lexer). The details
       of YYFILL implementation are slightly different depending on which  EOF
       handling  method is used: the case of EOF rule is somewhat simpler than
       the case  of  bounds-checking  with  padding.  Also  note  that  if  -f
       --storable-state  option  is used, YYFILL has slightly different seman-
       tics (desrbed in the section about storable state).

   YYFILL with EOF rule
       If EOF rule is used, YYFILL is a function-like primitive  that  accepts
       no  arguments and returns a value which is checked against zero. YYFILL
       invocation is triggered by condition YYLIMIT <= YYCURSOR in default API
       and  YYLESSTHAN()  in  generic  API. A non-zero return value means that
       YYFILL has failed. A successful YYFILL call must supply  at  least  one
       character  and adjust input positions accordingly. Limit must always be
       set to one after the last input position in buffer, and  the  character
       at the limit position must be the sentinel symbol specified by re2c:eof
       configuration. The pictures below show the relative locations of  input
       positions  in  buffer  before and after YYFILL call (sentinel symbol is
       marked with #, and the second picture shows the case when there is  not
       enough input to fill the whole buffer).

                         <-- shift -->
                       >-A------------B---------C-------------D#-----------E->
                       buffer       token    marker         limit,
                                                            cursor
          >-A------------B---------C-------------D------------E#->
                       buffer,  marker        cursor        limit
                       token

                         <-- shift -->
                       >-A------------B---------C-------------D#--E (EOF)
                       buffer       token    marker         limit,
                                                            cursor
          >-A------------B---------C-------------D---E#........
                       buffer,  marker       cursor limit
                       token

       Here  is  an  example  of  a program that reads input file input.txt in
       chunks of 4096 bytes and uses EOF rule.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <stdio.h>
          #include <string.h>

          #define SIZE 4096

          typedef struct {
              FILE *file;
              char buf[SIZE + 1], *lim, *cur, *mar, *tok;
              int eof;
          } Input;

          static int fill(Input *in)
          {
              if (in->eof) {
                  return 1;
              }
              const size_t free = in->tok - in->buf;
              if (free < 1) {
                  return 2;
              }
              memmove(in->buf, in->tok, in->lim - in->tok);
              in->lim -= free;
              in->cur -= free;
              in->mar -= free;
              in->tok -= free;
              in->lim += fread(in->lim, 1, free, in->file);
              in->lim[0] = 0;
              in->eof |= in->lim < in->buf + SIZE;
              return 0;
          }

          static void init(Input *in, FILE *file)
          {
              in->file = file;
              in->cur = in->mar = in->tok = in->lim = in->buf + SIZE;
              in->eof = 0;
              fill(in);
          }

          static int lex(Input *in)
          {
              int count = 0;
          loop:
              in->tok = in->cur;
              /*!re2c
              re2c:eof = 0;
              re2c:api:style = free-form;
              re2c:define:YYCTYPE  = char;
              re2c:define:YYCURSOR = in->cur;
              re2c:define:YYMARKER = in->mar;
              re2c:define:YYLIMIT  = in->lim;
              re2c:define:YYFILL   = "fill(in) == 0";

              *                           { return -1; }
              $                           { return count; }
              ['] ([^'\\] | [\\][^])* ['] { ++count; goto loop; }
              [ ]+                        { goto loop; }

              */
          }

          int main()
          {
              const char *fname = "input";
              const char str[] = "'qu\0tes' 'are' 'fine: \\'' ";
              FILE *f;
              Input in;

              // prepare input file: a few times the size of the buffer,
              // containing strings with zeroes and escaped quotes
              f = fopen(fname, "w");
              for (int i = 0; i < SIZE; ++i) {
                  fwrite(str, 1, sizeof(str) - 1, f);
              }
              fclose(f);

              f = fopen(fname, "r");
              init(&in, f);
              assert(lex(&in) == SIZE * 3);
              fclose(f);

              remove(fname);
              return 0;
          }


   YYFILL with padding
       In the default case (when EOF rule is  not  used)  YYFILL  is  a  func-
       tion-like  primitive that accepts a single argument and does not return
       any value.  YYFILL invocation is  triggered  by  condition  (YYLIMIT  -
       YYCURSOR)  <  n  in  default  API and YYLESSTHAN(n) in generic API. The
       argument passed to YYFILL is the minimal number of characters that must
       be  supplied. If it fails to do so, YYFILL must not return to the lexer
       (for that reason it is best implemented as a macro  that  returns  from
       the calling function on failure).  In case of a successful YYFILL invo-
       cation the limit position must be set either  to  one  after  the  last
       input  position  in buffer, or to the end of YYMAXFILL padding (in case
       YYFILL has successfully read at least n characters, but not  enough  to
       fill the entire buffer). The pictures below show the relative locations
       of input positions in buffer before and after YYFILL invocation (YYMAX-
       FILL padding on the second picture is marked with # symbols).

                         <-- shift -->                 <-- need -->
                       >-A------------B---------C-----D-------E---F--------G->
                       buffer       token    marker cursor  limit

          >-A------------B---------C-----D-------E---F--------G->
                       buffer,  marker cursor               limit
                       token

                         <-- shift -->                 <-- need -->
                       >-A------------B---------C-----D-------E-F        (EOF)
                       buffer       token    marker cursor  limit

          >-A------------B---------C-----D-------E-F###############
                       buffer,  marker cursor                   limit
                       token                        <- YYMAXFILL ->

       Here  is  an  example  of  a program that reads input file input.txt in
       chunks of 4096 bytes and uses bounds-checking with padding.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <stdio.h>
          #include <string.h>

          /*!max:re2c*/
          #define SIZE 4096

          typedef struct {
              FILE *file;
              char buf[SIZE + YYMAXFILL], *lim, *cur, *mar, *tok;
              int eof;
          } Input;

          static int fill(Input *in, size_t need)
          {
              if (in->eof) {
                  return 1;
              }
              const size_t free = in->tok - in->buf;
              if (free < need) {
                  return 2;
              }
              memmove(in->buf, in->tok, in->lim - in->tok);
              in->lim -= free;
              in->cur -= free;
              in->mar -= free;
              in->tok -= free;
              in->lim += fread(in->lim, 1, free, in->file);
              if (in->lim < in->buf + SIZE) {
                  in->eof = 1;
                  memset(in->lim, 0, YYMAXFILL);
                  in->lim += YYMAXFILL;
              }
              return 0;
          }

          static void init(Input *in, FILE *file)
          {
              in->file = file;
              in->cur = in->mar = in->tok = in->lim = in->buf + SIZE;
              in->eof = 0;
              fill(in, 1);
          }

          static int lex(Input *in)
          {
              int count = 0;
          loop:
              in->tok = in->cur;
              /*!re2c
              re2c:api:style = free-form;
              re2c:define:YYCTYPE  = char;
              re2c:define:YYCURSOR = in->cur;
              re2c:define:YYMARKER = in->mar;
              re2c:define:YYLIMIT  = in->lim;
              re2c:define:YYFILL   = "if (fill(in, @@) != 0) return -1;";

              *                           { return -1; }
              [\x00]                      { return (YYMAXFILL == in->lim - in->tok) ? count : -1; }
              ['] ([^'\\] | [\\][^])* ['] { ++count; goto loop; }
              [ ]+                        { goto loop; }

              */
          }

          int main()
          {
              const char *fname = "input";
              const char str[] = "'qu\0tes' 'are' 'fine: \\'' ";
              FILE *f;
              Input in;

              // prepare input file: a few times the size of the buffer,
              // containing strings with zeroes and escaped quotes
              f = fopen(fname, "w");
              for (int i = 0; i < SIZE; ++i) {
                  fwrite(str, 1, sizeof(str) - 1, f);
              }
              fclose(f);

              f = fopen(fname, "r");
              init(&in, f);
              assert(lex(&in) == SIZE * 3);
              fclose(f);

              remove(fname);
              return 0;
          }


INCLUDE FILES
       Re2c allows one to include other files using directive  /*!include:re2c
       FILE  */, where FILE is the name of file to be included. Re2c looks for
       included files in the directory of the including file  and  in  include
       locations,  which can be specified with -I option.  Re2c include direc-
       tive works in the same way as C/C++ #include: the contents of FILE  are
       copy-pasted  verbatim in place of the directive. Include files may have
       further includes of their own.  Re2c provides some  predefined  include
       files  that  can  be found in the include/ subdirectory of the project.
       These files contain definitions that can be useful  to  other  projects
       (such as Unicode categories) and form something like a standard library
       for re2c.  Here is an example:

   Include file (definitions.h)
          typedef enum { OK, FAIL } Result;

          /*!re2c
              number = [1-9][0-9]*;
          */


   Input file
          // re2c $INPUT -o $OUTPUT -i
          #include <assert.h>
          /*!include:re2c "definitions.h" */

          Result lex(const char *YYCURSOR)
          {
              /*!re2c
              re2c:define:YYCTYPE = char;
              re2c:yyfill:enable = 0;

              number { return OK; }
              *      { return FAIL; }
              */
          }

          int main()
          {
              assert(lex("123") == OK);
              return 0;
          }


HEADER FILES
       Re2c allows one to generate header file from the input .re  file  using
       option  -t,  --type-header  or configuration re2c:flags:type-header and
       directives  /*!header:re2c:on*/  and  /*!header:re2c:off*/.  The  first
       directive  marks the beginning of header file, and the second directive
       marks the end of it. Everything between these directives  is  processed
       by re2c, and the generated code is written to the file specified by the
       -t --type-header option (or stdout if this option was not used).  Auto-
       generated  header file may be needed in cases when re2c is used to gen-
       erate definitions of constants, variables and structs that must be vis-
       ible from other translation units.

       Here is an example of generating a header file that contains definition
       of the lexer state with tag variables (the number variables depends  on
       the regular grammar and is unknown to the programmer).

   Input file
          // re2c $INPUT -o $OUTPUT -i --type-header src/lexer/lexer.h
          #include <assert.h>
          #include "src/lexer/lexer.h" // generated by re2c

          /*!header:re2c:on*/

          typedef struct {
              const char *str, *cur, *mar;
              /*!stags:re2c format = "const char *@@{tag}; "; */
          } LexerState;

          /*!header:re2c:off*/

          int lex(LexerState *st)
          {
              /*!re2c
              re2c:flags:type-header = "src/lexer/lexer.h";
              re2c:yyfill:enable = 0;
              re2c:flags:tags = 1;
              re2c:define:YYCTYPE  = char;
              re2c:define:YYCURSOR = "st->cur";
              re2c:define:YYMARKER = "st->mar";
              re2c:tags:expression = "st->@@{tag}";

              [x]{1,4} / [x]{3,5} { return 0; } // ambiguous trailing context
              *                   { return 1; }
              */
          }

          int main()
          {
              LexerState st;
              st.str = st.cur = "xxxxxxxx";
              assert(lex(&st) == 0 && st.cur - st.str == 4);
              return 0;
          }


   Header file
          /* Generated by re2c */


          typedef struct {
              const char *str, *cur, *mar;
              const char *yyt1; const char *yyt2; const char *yyt3;
          } LexerState;



SUBMATCH EXTRACTION
       Re2c has two options for submatch extraction.

       The  first option is -T --tags. With this option one can use standalone
       tags of the form @stag and #mtag, where stag  and  mtag  are  arbitrary
       used-defined  names.  Tags  can  be  used  anywhere inside of a regular
       expression; semantically they are just position markers.  Tags  of  the
       form  @stag are called s-tags: they denote a single submatch value (the
       last input position where this tag matched). Tags of the form #mtag are
       called  m-tags: they denote multiple submatch values (the whole history
       of repetitions of this tag).  All tags should be defined by the user as
       variables  with the corresponding names. With standalone tags re2c uses
       leftmost greedy disambiguation: submatch positions  correspond  to  the
       leftmost matching path through the regular expression.

       The  second  option  is -P --posix-captures: it enables POSIX-compliant
       capturing groups. In  this  mode  parentheses  in  regular  expressions
       denote the beginning and the end of capturing groups; the whole regular
       expression is group number zero. The number of groups for the  matching
       rule  is stored in a variable yynmatch, and submatch results are stored
       in yypmatch array. Both yynmatch and yypmatch should be defined by  the
       user,  and yypmatch size must be at least [yynmatch * 2]. Re2c provides
       a directive /*!maxnmatch:re2c*/ that defines  YYMAXNMATCH:  a  constant
       equal  to the maximal value of yynmatch among all rules. Note that re2c
       implements POSIX-compliant disambiguation: each  subexpression  matches
       as  long  as possible, and subexpressions that start earlier in regular
       expression have priority over those starting  later.  Capturing  groups
       are  translated  into  s-tags under the hood, therefore we use the word
       "tag" to describe them as well.

       With both -P --posix-captures and T --tags options re2c uses  efficient
       submatch  extraction  algorithm  described  in the Tagged Deterministic
       Finite Automata with Lookahead paper. The overhead on submatch  extrac-
       tion  in  the generated lexer grows with the number of tags --- if this
       number is moderate, the overhead is barely  noticeable.  In  the  lexer
       tags are implemented using a number of tag variables generated by re2c.
       There is no one-to-one correspondence between tag variables and tags: a
       single  variable  may  be  reused  for  different tags, and one tag may
       require multiple variables to hold all its ambiguous values. Eventually
       ambiguity  is  resolved,  and only one final variable per tag survives.
       When a rule matches, all its tags are set to the values of  the  corre-
       sponding  tag  variables.  The exact number of tag variables is unknown
       to the user; this number is determined by re2c. However, tag  variables
       should  be defined by the user as a part of the lexer state and updated
       by YYFILL,  therefore  re2c  provides  directives  /*!stags:re2c*/  and
       /*!mtags:re2c*/  that can be used to declare, initialize and manipulate
       tag variables. These directives have two optional configurations:  for-
       mat  =  "@@";  (specifies the template where @@ is substituted with the
       name of each tag variable), and separator = ""; (specifies the piece of
       code used to join the generated pieces for different tag variables).

       S-tags support the following operations:

       o save  input  position to an s-tag: t = YYCURSOR with default API or a
         user-defined operation YYSTAGP(t) with generic API

       o save default value to an s-tag: t  =  NULL  with  default  API  or  a
         user-defined operation YYSTAGN(t) with generic API

       o copy one s-tag to another: t1 = t2

       M-tags support the following operations:

       o append  input  position  to  an  m-tag: a user-defined operation YYM-
         TAGP(t) with both default and generic API

       o append default value to an m-tag: a user-defined operation YYMTAGN(t)
         with both default and generic API

       o copy one m-tag to another: t1 = t2

       S-tags  can  be  implemented  as  scalar  values (pointers or offsets).
       M-tags need a more complex representation, as  they  need  to  store  a
       sequence  of  tag values. The most naive and inefficient representation
       of an m-tag is a list (array, vector) of tag values; a  more  efficient
       representation  is  to store all m-tags in a prefix-tree represented as
       array of nodes (v, p), where v is tag value and p is a pointer to  par-
       ent node.

       Here is an example of using s-tags to parse an IPv4 address.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <stdint.h>

          static uint32_t num(const char *s, const char *e)
          {
              uint32_t n = 0;
              for (; s < e; ++s) n = n * 10 + (*s - '0');
              return n;
          }

          static const uint64_t ERROR = ~0lu;

          static uint64_t lex(const char *YYCURSOR)
          {
              const char *YYMARKER, *o1, *o2, *o3, *o4;
              /*!stags:re2c format = 'const char *@@;'; */

              /*!re2c
              re2c:yyfill:enable = 0;
              re2c:flags:tags = 1;
              re2c:define:YYCTYPE = char;

              octet = [0-9] | [1-9][0-9] | [1][0-9][0-9] | [2][0-4][0-9] | [2][5][0-5];
              dot = [.];
              end = [\x00];

              @o1 octet dot @o2 octet dot @o3 octet dot @o4 octet end {
                  return num(o4, YYCURSOR - 1)
                      + (num(o3, o4 - 1) << 8)
                      + (num(o2, o3 - 1) << 16)
                      + (num(o1, o2 - 1) << 24);
              }
              * { return ERROR; }
              */
          }

          int main()
          {
              assert(lex("1.2.3.4") == 0x01020304);
              assert(lex("127.0.0.1") == 0x7f000001);
              assert(lex("255.255.255.255") == 0xffffffff);
              assert(lex("1.2.3.") == ERROR);
              assert(lex("1.2.3.256") == ERROR);
              return 0;
          }


       Here  is  an  example  of using POSIX capturing groups to parse an IPv4
       address.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <stdint.h>

          static uint32_t num(const char *s, const char *e)
          {
              uint32_t n = 0;
              for (; s < e; ++s) n = n * 10 + (*s - '0');
              return n;
          }

          /*!maxnmatch:re2c*/
          static const uint64_t ERROR = ~0lu;

          static uint64_t lex(const char *YYCURSOR)
          {
              const char *YYMARKER;
              const char *yypmatch[YYMAXNMATCH * 2];
              uint32_t yynmatch;
              /*!stags:re2c format = 'const char *@@;'; */

              /*!re2c
              re2c:yyfill:enable = 0;
              re2c:flags:posix-captures = 1;
              re2c:define:YYCTYPE = char;

              octet = [0-9] | [1-9][0-9] | [1][0-9][0-9] | [2][0-4][0-9] | [2][5][0-5];
              dot = [.];
              end = [\x00];

              (octet) dot (octet) dot (octet) dot (octet) end {
                  assert(yynmatch == 5);
                  return num(yypmatch[8], yypmatch[9])
                      + (num(yypmatch[6], yypmatch[7]) << 8)
                      + (num(yypmatch[4], yypmatch[5]) << 16)
                      + (num(yypmatch[2], yypmatch[3]) << 24);
              }
              * { return ERROR; }
              */
          }

          int main()
          {
              assert(lex("1.2.3.4") == 0x01020304);
              assert(lex("127.0.0.1") == 0x7f000001);
              assert(lex("255.255.255.255") == 0xffffffff);
              assert(lex("1.2.3.") == ERROR);
              assert(lex("1.2.3.256") == ERROR);
              return 0;
          }


       Here is an example of  using  m-tags  to  parse  a  semicolon-separated
       sequence  of  words  (C++).  Tag variables are stored in a tree that is
       packed in a vector.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <vector>
          #include <string>

          static const int ROOT = -1;

          struct Mtag {
              int pred;
              const char *tag;
          };

          typedef std::vector<Mtag> MtagTree;
          typedef std::vector<std::string> Words;

          static void mtag(int *pt, const char *t, MtagTree *tree)
          {
              Mtag m = {*pt, t};
              *pt = (int)tree->size();
              tree->push_back(m);
          }

          static void unfold(const MtagTree &tree, int x, int y, Words &words)
          {
              if (x == ROOT) return;
              unfold(tree, tree[x].pred, tree[y].pred, words);
              const char *px = tree[x].tag, *py = tree[y].tag;
              words.push_back(std::string(px, py - px));
          }

          #define YYMTAGP(t) mtag(&t, YYCURSOR, &tree)
          #define YYMTAGN(t) mtag(&t, NULL,     &tree)
          static bool lex(const char *YYCURSOR, Words &words)
          {
              const char *YYMARKER;
              /*!mtags:re2c format = "int @@ = ROOT;"; */
              MtagTree tree;
              int x, y;

              /*!re2c
              re2c:yyfill:enable = 0;
              re2c:flags:tags = 1;
              re2c:define:YYCTYPE = char;

              (#x [a-z]+ #y [;])+ {
                  words.clear();
                  unfold(tree, x, y, words);
                  return true;
              }
              * { return false; }
              */
          }

          int main()
          {
              Words w;
              assert(lex("one;two;three;", w) && w == Words({"one", "two", "three"}));
              return 0;
          }


STORABLE STATE
       With -f --storable-state option re2c generates a lexer that  can  store
       its  current  state,  return to the caller, and later resume operations
       exactly where it left off. The default mode of operation in re2c  is  a
       "pull"  model,  in which the lexer "pulls" more input whenever it needs
       it. This may be unacceptable in cases when the input becomes  available
       piece  by piece (for example, if the lexer is invoked by the parser, or
       if the lexer program communicates via a socket protocol with some other
       program  that  must wait for a reply from the lexer before it transmits
       the next message). Storable state feature is intended exactly for  such
       cases:  it  allows  one to generate lexers that work in a "push" model.
       When the lexer needs more input, it stores its state and returns to the
       caller.  Later,  when  more input becomes available, the caller resumes
       the lexer exactly where it stopped. There are a few  changes  necessary
       compared to the "pull" model:

       o Define YYSETSTATE() and YYGETSTATE(state) promitives.

       o Define  yych,  yyaccept  and  state variables as a part of persistent
         lexer state. The state variable should be initialized to -1.

       o YYFILL should return to the outer program instead of trying to supply
         more input. Return code should indicate that lexer needs more input.

       o The  outer  program should recognize situations when lexer needs more
         input and respond appropriately.

       o Use /*!getstate:re2c*/ directive if it is necessary  to  execute  any
         code before entering the lexer.

       o Use  configurations  state:abort and state:nextlabel to further tweak
         the generated code.

       Here is an example of a "push"-model lexer that reads input from  stdin
       and  expects  a sequence of words separated by spaces and newlines. The
       lexer loops forever, waiting for more input. It can  be  terminated  by
       sending  a special EOF token --- a word "stop", in which case the lexer
       terminates successfully and prints the number of  words  it  has  seen.
       Abnormal  termination  happens in case of a syntax error, premature end
       of input (without the "stop" word) or in case the buffer is  too  small
       to  hold  a  lexeme  (for  example,  if one of the words exceeds buffer
       size). Premature end of input happens in case the lexer fails  to  read
       any  input  while  being in the initial state --- this is the only case
       when EOF rule matches. Note that the lexer may call YYFILL twice before
       terminating  (and  thus require hitting Ctrl+D a few times). First time
       YYFILL is called when the lexer expects  continuation  of  the  current
       greedy  lexeme  (either  a  word  or  a whitespace sequence). If YYFILL
       fails, the lexer knows that it has reached the end of the current  lex-
       eme and executes the corresponding semantic action. The action jumps to
       the beginning of the loop, the lexer enters the initial state and calls
       YYFILL  once  more.  If it fails, the lexer matches EOF rule. (Alterna-
       tively EOF rule can be used for termination instead of  a  special  EOF
       lexeme.)

   Example
          // re2c $INPUT -o $OUTPUT -f
          #include <assert.h>
          #include <stdio.h>
          #include <string.h>

          #define DEBUG    0
          #define LOG(...) if (DEBUG) fprintf(stderr, __VA_ARGS__);
          #define BUFSIZE  10

          typedef struct {
              FILE *file;
              char buf[BUFSIZE + 1], *lim, *cur, *mar, *tok;
              unsigned yyaccept;
              int state;
          } Input;

          static void init(Input *in, FILE *f)
          {
              in->file = f;
              in->cur = in->mar = in->tok = in->lim = in->buf + BUFSIZE;
              in->lim[0] = 0; // append sentinel symbol
              in->yyaccept = 0;
              in->state = -1;
          }

          typedef enum {END, READY, WAITING, BAD_PACKET, BIG_PACKET} Status;

          static Status fill(Input *in)
          {
              const size_t shift = in->tok - in->buf;
              const size_t free = BUFSIZE - (in->lim - in->tok);

              if (free < 1) return BIG_PACKET;

              memmove(in->buf, in->tok, BUFSIZE - shift);
              in->lim -= shift;
              in->cur -= shift;
              in->mar -= shift;
              in->tok -= shift;

              const size_t read = fread(in->lim, 1, free, in->file);
              in->lim += read;
              in->lim[0] = 0; // append sentinel symbol

              return READY;
          }

          static Status lex(Input *in, unsigned int *recv)
          {
              char yych;
              /*!getstate:re2c*/
          loop:
              in->tok = in->cur;
              /*!re2c
                  re2c:eof = 0;
                  re2c:api:style = free-form;
                  re2c:define:YYCTYPE    = "char";
                  re2c:define:YYCURSOR   = "in->cur";
                  re2c:define:YYMARKER   = "in->mar";
                  re2c:define:YYLIMIT    = "in->lim";
                  re2c:define:YYGETSTATE = "in->state";
                  re2c:define:YYSETSTATE = "in->state = @@;";
                  re2c:define:YYFILL     = "return WAITING;";

                  packet = [a-z]+[;];

                  *      { return BAD_PACKET; }
                  $      { return END; }
                  packet { *recv = *recv + 1; goto loop; }
              */
          }

          void test(const char **packets, Status status)
          {
              const char *fname = "pipe";
              FILE *fw = fopen(fname, "w");
              FILE *fr = fopen(fname, "r");
              setvbuf(fw, NULL, _IONBF, 0);
              setvbuf(fr, NULL, _IONBF, 0);

              Input in;
              init(&in, fr);
              Status st;
              unsigned int send = 0, recv = 0;

              for (;;) {
                  st = lex(&in, &recv);
                  if (st == END) {
                      LOG("done: got %u packets\n", recv);
                      break;
                  } else if (st == WAITING) {
                      LOG("waiting...\n");
                      if (*packets) {
                          LOG("sent packet %u\n", send);
                          fprintf(fw, "%s", *packets++);
                          ++send;
                      }
                      st = fill(&in);
                      LOG("queue: '%s'\n", in.buf);
                      if (st == BIG_PACKET) {
                          LOG("error: packet too big\n");
                          break;
                      }
                      assert(st == READY);
                  } else {
                      assert(st == BAD_PACKET);
                      LOG("error: ill-formed packet\n");
                      break;
                  }
              }

              LOG("\n");
              assert(st == status);
              if (st == END) assert(recv == send);

              fclose(fw);
              fclose(fr);
              remove(fname);
          }

          int main()
          {
              const char *packets1[] = {0};
              const char *packets2[] = {"zero;", "one;", "two;", "three;", "four;", 0};
              const char *packets3[] = {"zer0;", 0};
              const char *packets4[] = {"goooooooooogle;", 0};

              test(packets1, END);
              test(packets2, END);
              test(packets3, BAD_PACKET);
              test(packets4, BIG_PACKET);

              return 0;
          }


REUSABLE BLOCKS
       Reuse  mode is enabled with the -r --reusable option. In this mode re2c
       allows one to reuse definitions, configurations and rules specified  by
       a  /*!rules:re2c*/  block  in  subsequent  /*!use:re2c*/  blocks. As of
       re2c-1.2 it is possible  to  mix  such  blocks  with  normal  /*!re2c*/
       blocks;  prior  to  that  re2c expects a single rules-block followed by
       use-blocks (normal blocks are disallowed). Use-blocks  can  have  addi-
       tional  definitions, configurations and rules: they are merged to those
       specified by the rules-block.  A very common use case for -r --reusable
       option  is  a lexer that supports multiple input encodings: lexer rules
       are defined once and reused multiple times with encoding-specific  con-
       figurations, such as re2c:flags:utf-8.

       Below  is  an example of a multi-encoding lexer: it reads a phrase with
       Unicode math symbols and accepts input either in UTF8 or in UT32.  Note
       that  the  --input-encoding utf8 option allows us to write UTF8-encoded
       symbols in the regular expressions;  without  this  option  re2c  would
       parse  them  as  a  plain  ASCII byte sequnce (and we would have to use
       hexadecimal escape sequences).

   Example
          // re2c $INPUT -o $OUTPUT -r --input-encoding utf8
          #include <assert.h>
          #include <stdint.h>

          /*!rules:re2c
              re2c:yyfill:enable = 0;

              "x y: p(x, y)" { return 0; }
              *                { return 1; }
          */

          static int lex_utf8(const uint8_t *YYCURSOR)
          {
              const uint8_t *YYMARKER;
              /*!use:re2c
              re2c:define:YYCTYPE = uint8_t;
              re2c:flags:8 = 1;
              */
          }

          static int lex_utf32(const uint32_t *YYCURSOR)
          {
              const uint32_t *YYMARKER;
              /*!use:re2c
              re2c:define:YYCTYPE = uint32_t;
              re2c:flags:8 = 0;
              re2c:flags:u = 1;
              */
          }

          int main()
          {
              static const uint8_t s8[] = // UTF-8
                  { 0xe2, 0x88, 0x80, 0x78, 0x20, 0xe2, 0x88, 0x83, 0x79
                  , 0x3a, 0x20, 0x70, 0x28, 0x78, 0x2c, 0x20, 0x79, 0x29 };

              static const uint32_t s32[] = // UTF32
                  { 0x00002200, 0x00000078, 0x00000020, 0x00002203
                  , 0x00000079, 0x0000003a, 0x00000020, 0x00000070
                  , 0x00000028, 0x00000078, 0x0000002c, 0x00000020
                  , 0x00000079, 0x00000029 };

              assert(lex_utf8(s8) == 0);
              assert(lex_utf32(s32) == 0);
              return 0;
          }



ENCODING SUPPORT
       re2c supports the following encodings: ASCII  (default),  EBCDIC  (-e),
       UCS-2  (-w), UTF-16 (-x), UTF-32 (-u) and UTF-8 (-8).  See also inplace
       configuration re2c:flags.

       The following concepts should be clarified when  talking  about  encod-
       ings.  A code point is an abstract number that represents a single sym-
       bol.  A code unit is the smallest unit of memory, which is used in  the
       encoded text (it corresponds to one character in the input stream). One
       or more code units may be needed to  represent  a  single  code  point,
       depending  on the encoding. In a fixed-length encoding, each code point
       is represented with an equal number of code units.  In  variable-length
       encodings, different code points can be represented with different num-
       ber of code units.

       o ASCII is a fixed-length encoding. Its code space includes 0x100  code
         points,  from 0 to 0xFF. A code point is represented with exactly one
         1-byte code unit, which has the same value as  the  code  point.  The
         size of YYCTYPE must be 1 byte.

       o EBCDIC is a fixed-length encoding. Its code space includes 0x100 code
         points, from 0 to 0xFF. A code point is represented with exactly  one
         1-byte  code  unit,  which  has the same value as the code point. The
         size of YYCTYPE must be 1 byte.

       o UCS-2 is a fixed-length encoding. Its  code  space  includes  0x10000
         code  points,  from  0  to 0xFFFF. One code point is represented with
         exactly one 2-byte code unit, which has the same value  as  the  code
         point. The size of YYCTYPE must be 2 bytes.

       o UTF-16  is  a  variable-length  encoding. Its code space includes all
         Unicode code points, from 0 to 0xD7FF and from  0xE000  to  0x10FFFF.
         One  code point is represented with one or two 2-byte code units. The
         size of YYCTYPE must be 2 bytes.

       o UTF-32 is a fixed-length encoding. Its code space includes  all  Uni-
         code  code  points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
         code point is represented with exactly one 4-byte code unit. The size
         of YYCTYPE must be 4 bytes.

       o UTF-8 is a variable-length encoding. Its code space includes all Uni-
         code code points, from 0 to 0xD7FF and from 0xE000 to  0x10FFFF.  One
         code point is represented with a sequence of one, two, three, or four
         1-byte code units. The size of YYCTYPE must be 1 byte.

       In Unicode, values from range 0xD800 to  0xDFFF  (surrogates)  are  not
       valid  Unicode  code  points.  Any  encoded sequence of code units that
       would map to  Unicode  code  points  in  the  range  0xD800-0xDFFF,  is
       ill-formed.  The  user  can  control  how  re2c  treats such ill-formed
       sequences with the --encoding-policy <policy> switch.

       For some encodings, there are code units that never occur  in  a  valid
       encoded  stream  (e.g.,  0xFF  byte in UTF-8). If the generated scanner
       must check for invalid input, the only correct way to do so is  to  use
       the  default  rule (*). Note that the full range rule ([^]) won't catch
       invalid code units when a variable-length encoding is used  ([^]  means
       "any  valid code point", whereas the default rule (*) means "any possi-
       ble code unit").

START CONDITIONS
       Conditions are enabled with -c --conditions.  This option allows one to
       encode multiple interrelated lexers within the same re2c block.

       Each  lexer  corresponds to a single condition.  It starts with a label
       of the form yyc_name, where name is condition name and yyc  prefix  can
       be  adjusted  with configuration re2c:condprefix.  Different lexers are
       separated with  a  comment  /*  ***********************************  */
       which can be adjusted with configuration re2c:cond:divider.

       Furthermore,  each  condition  has a unique identifier of the form yyc-
       name, where name is condition name and yyc prefix can be adjusted  with
       configuration  re2c:condenumprefix.   Identifiers have the type YYCOND-
       TYPE and should be  generated  with  /*!types:re2c*/  directive  or  -t
       --type-header  option.   Users shouldn't define these identifiers manu-
       ally, as the order of conditions is not specified.

       Before all conditions re2c generates entry code that checks the current
       condition  identifier  and transfers control flow to the start label of
       the active condition.  After matching  some  rule  of  this  condition,
       lexer  may  either  transfer control flow back to the entry code (after
       executing the associated action and optionally setting  another  condi-
       tion with =>), or use :=> shortcut and transition directly to the start
       label of another condition (skipping the action and  the  entry  code).
       Configuration re2c:cond:goto allows one to change the default behavior.

       Syntactically each rule must be preceded with a list of comma-separated
       condition names or a wildcard * enclosed in angle  brackets  <  and  >.
       Wildcard  means "any condition" and is semantically equivalent to list-
       ing all condition names.  Here regexp is a regular expression,  default
       refers to the default rule *, and action is a block of code.

       o <conditions-or-wildcard>  regexp-or-default                 action

       o <conditions-or-wildcard>  regexp-or-default  =>  condition  action

       o <conditions-or-wildcard>  regexp-or-default  :=> condition

       Rules with an exclamation mark ! in front of condition list have a spe-
       cial meaning: they have  no  regular  expression,  and  the  associated
       action  is  merged  as  an entry code to actions of normal rules.  This
       might be a convenient place to peform a routine task that is common  to
       all rules.

       o <!conditions-or-wildcard>  action

       Another  special  form  of rules with an empty condition list <> and no
       regular expression allows one to specify an "entry condition" that  can
       be  used to execute code before entering the lexer.  It is semantically
       equivalent to a condition with number zero, name 0 and an empty regular
       expression.

       o <>                 action

       o <>  =>  condition  action

       o <>  :=> condition

   Example
          // re2c $INPUT -o $OUTPUT -ci
          #include <stdint.h>
          #include <limits.h>
          #include <assert.h>

          static const uint64_t ERROR = ~0lu;
          /*!types:re2c*/

          template<int BASE> static void adddgt(uint64_t &u, unsigned int d)
          {
              u = u * BASE + d;
              if (u > UINT32_MAX) u = ERROR;
          }

          static uint64_t parse_u32(const char *s)
          {
              const char *YYMARKER;
              int c = yycinit;
              uint64_t u = 0;

              /*!re2c
              re2c:yyfill:enable = 0;
              re2c:api:style = free-form;
              re2c:define:YYCTYPE = char;
              re2c:define:YYCURSOR = s;
              re2c:define:YYGETCONDITION = "c";
              re2c:define:YYSETCONDITION = "c = @@;";

              <*> * { return ERROR; }

              <init> '0b' / [01]        :=> bin
              <init> "0"                :=> oct
              <init> "" / [1-9]         :=> dec
              <init> '0x' / [0-9a-fA-F] :=> hex

              <bin, oct, dec, hex> "\x00" { return u; }

              <bin> [01]  { adddgt<2> (u, s[-1] - '0');      goto yyc_bin; }
              <oct> [0-7] { adddgt<8> (u, s[-1] - '0');      goto yyc_oct; }
              <dec> [0-9] { adddgt<10>(u, s[-1] - '0');      goto yyc_dec; }
              <hex> [0-9] { adddgt<16>(u, s[-1] - '0');      goto yyc_hex; }
              <hex> [a-f] { adddgt<16>(u, s[-1] - 'a' + 10); goto yyc_hex; }
              <hex> [A-F] { adddgt<16>(u, s[-1] - 'A' + 10); goto yyc_hex; }
              */
          }

          int main()
          {
              assert(parse_u32("1234567890") == 1234567890);
              assert(parse_u32("0b1101") == 13);
              assert(parse_u32("0x7Fe") == 2046);
              assert(parse_u32("0644") == 420);
              assert(parse_u32("9999999999") == ERROR);
              assert(parse_u32("") == ERROR);
              return 0;
          }


SKELETON PROGRAMS
       With the -S, --skeleton option, re2c ignores all non-re2c code and gen-
       erates a self-contained C program that can be further compiled and exe-
       cuted. The program consists of lexer code and input data. For each con-
       structed DFA (block or condition) re2c generates a standalone lexer and
       two files: an .input file with strings derived from the DFA and a .keys
       file with expected match results. The program runs each  lexer  on  the
       corresponding  .input  file and compares results with the expectations.
       Skeleton programs are very useful for a number of reasons:

       o They can check correctness of various re2c optimizations (the data is
         generated  early  in the process, before any DFA transformations have
         taken place).

       o Generating a set of input data with good coverage may be  useful  for
         both testing and benchmarking.

       o Generating self-contained executable programs allows one to get mini-
         mized test cases (the original code may be large or  have  a  lot  of
         dependencies).

       The  difficulty with generating input data is that for all but the most
       trivial cases the number of possible input strings is too  large  (even
       if the string length is limited). Re2c solves this difficulty by gener-
       ating sufficiently many strings to cover almost all DFA transitions. It
       uses  the  following  algorithm. First, it constructs a skeleton of the
       DFA. For encodings with 1-byte code unit size (such as ASCII, UTF-8 and
       EBCDIC)  skeleton is just an exact copy of the original DFA. For encod-
       ings with multibyte code units skeleton is a copy of DFA  with  certain
       transitions omitted: namely, re2c takes at most 256 code units for each
       disjoint continuous range that corresponds to a  DFA  transition.   The
       chosen  values are evenly distributed and include range bounds. Instead
       of trying to cover all possible paths in the skeleton (which is  infea-
       sible)  re2c  generates  sufficiently  many paths to cover all skeleton
       transitions, and thus trigger the corresponding  conditional  jumps  in
       the  lexer.  The algorithm implementation is limited by ~1Gb of transi-
       tions and consumes constant amount of memory (re2c writes data to  file
       as soon as it is generated).

VISUALIZATION AND DEBUG
       With  the  -D, --emit-dot option, re2c does not generate code. Instead,
       it dumps the generated DFA in DOT format.  One can convert this dump to
       an  image of the DFA using Graphviz or another library.  Note that this
       option shows the final DFA after it has gone through a number of  opti-
       mizations  and transformations. Earlier stages can be dumped with vari-
       ous debug options, such as --dump-nfa,  --dump-dfa-raw  etc.  (see  the
       full list of options).


ATTRIBUTES
       See attributes(7) for descriptions of the following attributes:


       +---------------+-----------------------+
       |ATTRIBUTE TYPE |   ATTRIBUTE VALUE     |
       +---------------+-----------------------+
       |Availability   | developer/parser/re2c |
       +---------------+-----------------------+
       |Stability      | Uncommitted           |
       +---------------+-----------------------+

SEE ALSO
       You  can  find  more  information  about  re2c at the official website:
       http://re2c.org.   Similar  programs  are   flex(1),   lex(1),   quex(-
       http://quex.sourceforge.net).

AUTHORS
       Re2c  was  originaly  written by Peter Bumbulis in 1993.  Since then it
       has been developed and maintained by multiple volunteers; mots notably,
       Brain Young, Marcus Boerger, Dan Nuffer and Ulya Trofimovich.



NOTES
       Source  code  for open source software components in Oracle Solaris can
       be found at https://www.oracle.com/downloads/opensource/solaris-source-
       code-downloads.html.

       This     software     was    built    from    source    available    at
       https://github.com/oracle/solaris-userland.   The  original   community
       source                was                downloaded                from
       https://github.com/skvadrik/re2c/releases/down-
       load/2.0.3/re2c-2.0.3.tar.xz.

       Further information about this software can be found on the open source
       community website at http://re2c.org/.



                                                                       RE2C(1)