pcretest - man pages section 1: User Commands

Language:

pcretest (1)

Name

pcretest - compatible regular expressions.

Synopsis

pcretest [options] [input file [output file]]

pcretest was written as a test program for the PCRE  regular
expression  library  itself,  but  it  can  also be used for
experimenting  with  regular  expressions.   This   document
describes  the  features of the test program; for details of
the regular expressions themselves, see the pcrepattern doc-
umentation.  For  details of the PCRE library function calls
and their options, see the pcreapi documentation. The  input
for  pcretest  is  a sequence of regular expression patterns
and strings to be matched, as described  below.  The  output
shows  the result of each match. Options on the command line
and the patterns control PCRE options and  exactly  what  is
output.

Description




User Commands                                         PCRETEST(1)



NAME
     pcretest  -  a  program  for testing Perl-compatible regular
     expressions.

SYNOPSIS

     pcretest [options] [input file [output file]]

     pcretest was written as a test program for the PCRE  regular
     expression  library  itself,  but  it  can  also be used for
     experimenting  with  regular  expressions.   This   document
     describes  the  features of the test program; for details of
     the regular expressions themselves, see the pcrepattern doc-
     umentation.  For  details of the PCRE library function calls
     and their options, see the pcreapi documentation. The  input
     for  pcretest  is  a sequence of regular expression patterns
     and strings to be matched, as described  below.  The  output
     shows  the result of each match. Options on the command line
     and the patterns control PCRE options and  exactly  what  is
     output.

COMMAND LINE OPTIONS

     -b        Behave  as  if  each pattern has the /B (show byte
               code) modifier; the internal form is output  after
               compilation.

     -C        Output the version number of the PCRE library, and
               all available information about the optional  fea-
               tures that are included, and then exit.

     -d        Behave as if each pattern has the /D (debug) modi-
               fier; the internal form and information about  the
               compiled  pattern  is output after compilation; -d
               is equivalent to -b -i.

     -dfa      Behave as if each data line contains the \D escape
               sequence;  this  causes  the  alternative matching
               function, pcre_dfa_exec(), to be used  instead  of
               the  standard pcre_exec() function (more detail is
               given below).

     -help     Output a brief  summary  these  options  and  then
               exit.

     -i        Behave  as  if  each  pattern has the /I modifier;
               information about the compiled  pattern  is  given
               after compilation.

     -M        Behave as if each data line contains the \M escape
               sequence; this causes PCRE to discover the minimum
               MATCH_LIMIT  and MATCH_LIMIT_RECURSION settings by



SunOS 5.11                Last change:                          1






User Commands                                         PCRETEST(1)



               calling pcre_exec() repeatedly with different lim-
               its.

     -m        Output  the size of each compiled pattern after it
               has been compiled. This is equivalent to adding /M
               to each regular expression.

     -o osize  Set  the  number  of elements in the output vector
               that  is  used   when   calling   pcre_exec()   or
               pcre_dfa_exec()  to be osize. The default value is
               45, which is enough for  14  capturing  subexpres-
               sions  for pcre_exec() or 22 different matches for
               pcre_dfa_exec(). The vector size  can  be  changed
               for  individual  matching calls by including \O in
               the data line (see below).

     -p        Behave as if each pattern has the /P modifier; the
               POSIX  wrapper  API  is used to call PCRE. None of
               the other options has any effect when -p is set.

     -q        Do not output the version number  of  pcretest  at
               the start of execution.

     -S size   On Unix-like systems, set the size of the run-time
               stack to size megabytes.

     -s or -s+ Behave as if each pattern has the /S modifier;  in
               other  words, force each pattern to be studied. If
               -s+ is used, the  PCRE_STUDY_JIT_COMPILE  flag  is
               passed to pcre_study(), causing just-in-time opti-
               mization to be set up if it is available.  If  the
               /I  or /D option is present on a pattern (request-
               ing output about the compiled  pattern),  informa-
               tion  about the result of studying is not included
               when studying is caused only by -s and neither  -i
               nor -d is present on the command line. This behav-
               iour means that the output from tests that are run
               with  and  without  -s should be identical, except
               when options that  output  information  about  the
               actual running of a match are set. The -M, -t, and
               -tm  options,   which   give   information   about
               resources  used,  are  likely to produce different
               output with and without -s. Output may also differ
               if  the /C option is present on an individual pat-
               tern. This uses callouts to trace the the matching
               process, and this may be different between studied
               and non-studied patterns. If the pattern  contains
               (*MARK)  items  there may also be differences, for
               the same reason. The -s command line option can be
               overridden for specific patterns that should never
               be studied (see the /S pattern modifier below).




SunOS 5.11                Last change:                          2






User Commands                                         PCRETEST(1)



     -t        Run each compile, study, and match many times with
               a  timer, and output resulting time per compile or
               match (in milliseconds). Do not set  -m  with  -t,
               because  you  will then get the size output a zil-
               lion times, and the timing will be distorted.  You
               can control the number of iterations that are used
               for timing by following -t with  a  number  (as  a
               separate  item  on the command line). For example,
               "-t 1000" would iterate 1000 times. The default is
               to iterate 500000 times.

     -tm       This  is  like  -t  except  that it times only the
               matching phase, not the compile or study phases.

DESCRIPTION

     If pcretest is given two filename arguments, it  reads  from
     the  first and writes to the second. If it is given only one
     filename argument, it reads from that  file  and  writes  to
     stdout. Otherwise, it reads from stdin and writes to stdout,
     and prompts for each line of input, using  "re>"  to  prompt
     for  regular  expressions,  and  "data>"  to prompt for data
     lines.

     When pcretest is built, a configuration option  can  specify
     that  it should be linked with the libreadline library. When
     this is done, if the input is from a terminal,  it  is  read
     using  the  readline()  function. This provides line-editing
     and history facilities. The output  from  the  -help  option
     states whether or not readline() will be used.

     The  program handles any number of sets of input on a single
     input file. Each set starts with a regular  expression,  and
     continues  with  any  number  of  data  lines  to be matched
     against the pattern.

     Each data line is matched separately and  independently.  If
     you  want  to  do multi-line matches, you have to use the \n
     escape sequence (or \r or \r\n, etc., depending on the  new-
     line  setting)  in a single line of input to encode the new-
     line sequences. There is no limit  on  the  length  of  data
     lines;  the  input buffer is automatically extended if it is
     too small.

     An empty line signals the end of the data  lines,  at  which
     point  a new regular expression is read. The regular expres-
     sions are given enclosed in any non-alphanumeric  delimiters
     other than backslash, for example:

       /(a|bc)x+yz/

     White  space  before  the  initial  delimiter  is ignored. A



SunOS 5.11                Last change:                          3






User Commands                                         PCRETEST(1)



     regular expression  may  be  continued  over  several  input
     lines,  in  which  case  the newline characters are included
     within it. It is possible to include  the  delimiter  within
     the pattern by escaping it, for example

       /abc\/def/

     If  you do so, the escape and the delimiter form part of the
     pattern, but since delimiters are  always  non-alphanumeric,
     this does not affect its interpretation.  If the terminating
     delimiter is immediately followed by a backslash, for  exam-
     ple,

       /abc/\

     then a backslash is added to the end of the pattern. This is
     done to provide a way of testing the  error  condition  that
     arises if a pattern finishes with a backslash, because

       /abc\/

     is  interpreted  as  the first line of a pattern that starts
     with "abc/", causing pcretest to read the  next  line  as  a
     continuation of the regular expression.

PATTERN MODIFIERS

     A  pattern may be followed by any number of modifiers, which
     are mostly single characters. Following  Perl  usage,  these
     are  referred  to  below as, for example, "the /i modifier",
     even though the delimiter of the pattern need not always  be
     a  slash, and no slash is used when writing modifiers. White
     space may appear between the final pattern delimiter and the
     first modifier, and between the modifiers themselves.

     The  /i,  /m,  /s,  and  /x modifiers set the PCRE_CASELESS,
     PCRE_MULTILINE,  PCRE_DOTALL,  or   PCRE_EXTENDED   options,
     respectively, when pcre_compile() is called. These four mod-
     ifier letters have the same effect as they do in  Perl.  For
     example:

       /caseless/i

     The  following  table shows additional modifiers for setting
     PCRE compile-time options that do not correspond to anything
     in Perl:

       /8              PCRE_UTF8
       /?              PCRE_NO_UTF8_CHECK
       /A              PCRE_ANCHORED
       /C              PCRE_AUTO_CALLOUT
       /E              PCRE_DOLLAR_ENDONLY



SunOS 5.11                Last change:                          4






User Commands                                         PCRETEST(1)



       /f              PCRE_FIRSTLINE
       /J              PCRE_DUPNAMES
       /N              PCRE_NO_AUTO_CAPTURE
       /U              PCRE_UNGREEDY
       /W              PCRE_UCP
       /X              PCRE_EXTRA
       /Y              PCRE_NO_START_OPTIMIZE
       /<JS>           PCRE_JAVASCRIPT_COMPAT
       /<cr>           PCRE_NEWLINE_CR
       /<lf>           PCRE_NEWLINE_LF
       /<crlf>         PCRE_NEWLINE_CRLF
       /<anycrlf>      PCRE_NEWLINE_ANYCRLF
       /<any>          PCRE_NEWLINE_ANY
       /<bsr_anycrlf>  PCRE_BSR_ANYCRLF
       /<bsr_unicode>  PCRE_BSR_UNICODE

     The  modifiers  that are enclosed in angle brackets are lit-
     eral strings as shown, including the angle brackets, but the
     letters  within  can  be  in either case.  This example sets
     multiline matching with CRLF as the line ending sequence:

       /^abc/m<CRLF>

     As well as turning on the PCRE_UTF8 option, the /8  modifier
     also causes any non-printing characters in output strings to
     be printed using the \x{hh...} notation if  they  are  valid
     UTF-8  sequences. Full details of the PCRE options are given
     in the pcreapi documentation.

  Finding all matches in a string

     Searching for  all  possible  matches  within  each  subject
     string  can  be  requested  by  the /g or /G modifier. After
     finding a match, PCRE is called again to search the  remain-
     der  of the subject string. The difference between /g and /G
     is  that  the  former  uses  the  startoffset  argument   to
     pcre_exec()  to  start  searching  at a new point within the
     entire string (which is in effect what Perl  does),  whereas
     the  latter  passes over a shortened substring. This makes a
     difference to the matching process  if  the  pattern  begins
     with a lookbehind assertion (including \b or \B).

     If any call to pcre_exec() in a /g or /G sequence matches an
     empty   string,   the   next   call   is   done   with   the
     PCRE_NOTEMPTY_ATSTART  and  PCRE_ANCHORED flags set in order
     to search for another, non-empty, match at the  same  point.
     If  this  second  match fails, the start offset is advanced,
     and the normal match is retried. This imitates the way  Perl
     handles such cases when using the /g modifier or the split()
     function. Normally, the start  offset  is  advanced  by  one
     character,  but if the newline convention recognizes CRLF as
     a newline, and the current character is CR followed  by  LF,



SunOS 5.11                Last change:                          5






User Commands                                         PCRETEST(1)



     an advance of two is used.

  Other modifiers

     There  are  yet  more  modifiers  for  controlling  the  way
     pcretest operates.

     The /+ modifier requests that as well as outputting the sub-
     string  that  matched the entire pattern, pcretest should in
     addition output the remainder of the subject string. This is
     useful  for tests where the subject contains multiple copies
     of the same substring. If the + modifier appears twice,  the
     same  action  is taken for captured substrings. In each case
     the remainder is output on the following line  with  a  plus
     character following the capture number. Note that this modi-
     fier must not immediately follow the /S modifier because /S+
     has another meaning.

     The  /=  modifier  requests that the values of all potential
     captured parentheses be output after a match by pcre_exec().
     By  default,  only those up to the highest one actually used
     in the match are output (corresponding to  the  return  code
     from  pcre_exec()). Values in the offsets vector correspond-
     ing to higher numbers should be set to  -1,  and  these  are
     output  as  "<unset>". This modifier gives a way of checking
     that this is happening.

     The /B modifier is a debugging  feature.  It  requests  that
     pcretest  output  a representation of the compiled byte code
     after compilation. Normally this information contains length
     and offset values; however, if /Z is also present, this data
     is replaced by spaces. This is a special feature for use  in
     the  automatic test scripts; it ensures that the same output
     is generated for different internal link sizes.

     The /D modifier is a PCRE debugging feature, and is  equiva-
     lent to /BI, that is, both the /B and the /I modifiers.

     The  /F  modifier  causes pcretest to flip the byte order of
     the fields in the compiled pattern that contain  2-byte  and
     4-byte  numbers. This facility is for testing the feature in
     PCRE that allows it to execute patterns that  were  compiled
     on  a  host with a different endianness. This feature is not
     available when the POSIX interface to PCRE  is  being  used,
     that is, when the /P pattern modifier is specified. See also
     the section about saving  and  reloading  compiled  patterns
     below.

     The  /I  modifier  requests that pcretest output information
     about the compiled pattern (whether it is  anchored,  has  a
     fixed  first  character, and so on). It does this by calling
     pcre_fullinfo() after compiling a pattern. If the pattern is



SunOS 5.11                Last change:                          6






User Commands                                         PCRETEST(1)



     studied, the results of that are also output.

     The  /K  modifier requests pcretest to show names from back-
     tracking control verbs  that  are  returned  from  calls  to
     pcre_exec(). It causes pcretest to create a pcre_extra block
     if  one  has  not  already  been  created  by  a   call   to
     pcre_study(),  and  to  set the PCRE_EXTRA_MARK flag and the
     mark field within it, every time that pcre_exec() is called.
     If  the  variable  that the mark field points to is non-NULL
     for a match, non-match, or partial  match,  pcretest  prints
     the string to which it points. For a match, this is shown on
     a line by itself, tagged with "MK:".  For a non-match it  is
     added to the message.

     The  /L  modifier must be followed directly by the name of a
     locale, for example,

       /pattern/Lfr_FR

     For this reason, it must be the  last  modifier.  The  given
     locale is set, pcre_maketables() is called to build a set of
     character tables for the locale, and this is then passed  to
     pcre_compile()  when compiling the regular expression. With-
     out an /L (or /T) modifier, NULL is  passed  as  the  tables
     pointer; that is, /L applies only to the expression on which
     it appears.

     The /M modifier causes the size of memory block used to hold
     the compiled pattern to be output. This does not include the
     size of the pcre block; it is just the actual compiled data.
     If   the   pattern   is   successfully   studied   with  the
     PCRE_STUDY_JIT_COMPILE option, the size of the JIT  compiled
     code is also output.

     If  the  /S modifier appears once, it causes pcre_study() to
     be called after the expression has been  compiled,  and  the
     results  used  when the expression is matched. If /S appears
     twice, it suppresses studying,  even  if  it  was  requested
     externally by the -s command line option. This makes it pos-
     sible to specify that certain patterns are  always  studied,
     and others are never studied, independently of -s. This fea-
     ture is used in the test files in a few cases where the out-
     put is different when the pattern is studied.

     If the /S modifier is immediately followed by a + character,
     the   call   to    pcre_study()    is    made    with    the
     PCRE_STUDY_JIT_COMPILE option, requesting just-in-time opti-
     mization support if it is available. Note that there is also
     a  /+  modifier;  it  must not be given immediately after /S
     because this will be misinterpreted. If JIT studying is suc-
     cessful,  it  will automatically be used when pcre_exec() is
     run,  except  when   incompatible   run-time   options   are



SunOS 5.11                Last change:                          7






User Commands                                         PCRETEST(1)



     specified.  These  include  the  partial matching options; a
     complete list is given in  the  pcrejit  documentation.  See
     also  the  \J escape sequence below for a way of setting the
     size of the JIT stack.

     The /T modifier must be  followed  by  a  single  digit.  It
     causes  a  specific  set  of built-in character tables to be
     passed to pcre_compile(). It is used in  the  standard  PCRE
     tests  to  check  behaviour with different character tables.
     The digit specifies the tables as follows:

       0   the default ASCII tables, as distributed in
             pcre_chartables.c.dist
       1   a set of tables defining ISO 8859 characters

     In table 1, some characters whose codes are greater than 128
     are identified as letters, digits, spaces, etc.

  Using the POSIX wrapper API

     The  /P  modifier causes pcretest to call PCRE via the POSIX
     wrapper API rather than its native API. When /P is set,  the
     following modifiers set options for the regcomp() function:

       /i    REG_ICASE
       /m    REG_NEWLINE
       /N    REG_NOSUB
       /s    REG_DOTALL     )
       /U    REG_UNGREEDY   ) These options are not part of
       /W    REG_UCP        )   the POSIX standard
       /8    REG_UTF8       )

     The  /+  modifier  works as described above. All other modi-
     fiers are ignored.

DATA LINES

     Before each data line is passed to pcre_exec(), leading  and
     trailing  white space is removed, and it is then scanned for
     \ escapes. Some  of  these  are  pretty  esoteric  features,
     intended  for checking out some of the more complicated fea-
     tures of PCRE. If you are just  testing  "ordinary"  regular
     expressions,  you probably don't need any of these. The fol-
     lowing escapes are recognized:

       \a         alarm (BEL, \x07)
       \b         backspace (\x08)
       \e         escape (\x27)
       \f         form feed (\x0c)
       \n         newline (\x0a)
       \qdd       set the PCRE_MATCH_LIMIT limit to dd
                    (any number of digits)



SunOS 5.11                Last change:                          8






User Commands                                         PCRETEST(1)



       \r         carriage return (\x0d)
       \t         tab (\x09)
       \v         vertical tab (\x0b)
       \nnn       octal character (up to 3 octal digits)
                    always a byte unless > 255 in UTF-8 mode
       \xhh       hexadecimal byte (up to 2 hex digits)
       \x{hh...}  hexadecimal character, any number of digits
                    in UTF-8 mode
       \A         pass the PCRE_ANCHORED option to pcre_exec()
                    or pcre_dfa_exec()
       \B         pass the PCRE_NOTBOL option to pcre_exec()
                    or pcre_dfa_exec()
       \Cdd       call pcre_copy_substring() for substring dd
                    after a successful match  (number  less  than
     32)
       \Cname     call pcre_copy_named_substring() for substring
                    "name" after a successful match (name termin-
                    ated by next non alphanumeric character)
       \C+        show the current captured substrings at callout
                    time
       \C-        do not supply a callout function
       \C!n       return 1 instead of 0 when callout number n is
                    reached
       \C!n!m     return 1 instead of 0 when callout number n is
                    reached for the nth time
       \C*n       pass the number n (may be negative) as callout
                    data;  this  is  used  as  the callout return
     value
       \D         use the pcre_dfa_exec() match function
       \F         only shortest match for pcre_dfa_exec()
       \Gdd       call pcre_get_substring() for substring dd
                    after a successful match  (number  less  than
     32)
       \Gname     call pcre_get_named_substring() for substring
                    "name" after a successful match (name termin-
                    ated by next non-alphanumeric character)
       \Jdd       set up a JIT stack of dd kilobytes maximum (any
                    number of digits)
       \L         call pcre_get_substringlist() after a
                    successful match
       \M         discover the minimum MATCH_LIMIT and
                    MATCH_LIMIT_RECURSION settings
       \N         pass the PCRE_NOTEMPTY option to pcre_exec()
                    or pcre_dfa_exec(); if used twice, pass the
                    PCRE_NOTEMPTY_ATSTART option
       \Odd       set the size of the output vector passed to
                    pcre_exec() to dd (any number of digits)
       \P            pass   the   PCRE_PARTIAL_SOFT   option   to
     pcre_exec()
                    or pcre_dfa_exec(); if used twice, pass the
                    PCRE_PARTIAL_HARD option
       \Qdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd



SunOS 5.11                Last change:                          9






User Commands                                         PCRETEST(1)



                    (any number of digits)
       \R           pass   the   PCRE_DFA_RESTART    option    to
     pcre_dfa_exec()
       \S          output details of memory get/free calls during
     matching
       \Y          pass  the  PCRE_NO_START_OPTIMIZE  option   to
     pcre_exec()
                    or pcre_dfa_exec()
       \Z         pass the PCRE_NOTEOL option to pcre_exec()
                    or pcre_dfa_exec()
       \?         pass the PCRE_NO_UTF8_CHECK option to
                    pcre_exec() or pcre_dfa_exec()
       \>dd        start  the  match  at offset dd (optional "-";
     then
                    any  number  of  digits);   this   sets   the
     startoffset
                    argument for pcre_exec() or pcre_dfa_exec()
       \<cr>      pass the PCRE_NEWLINE_CR option to pcre_exec()
                    or pcre_dfa_exec()
       \<lf>      pass the PCRE_NEWLINE_LF option to pcre_exec()
                    or pcre_dfa_exec()
       \<crlf>       pass   the   PCRE_NEWLINE_CRLF   option   to
     pcre_exec()
                    or pcre_dfa_exec()
       \<anycrlf>  pass  the   PCRE_NEWLINE_ANYCRLF   option   to
     pcre_exec()
                    or pcre_dfa_exec()
       \<any>     pass the PCRE_NEWLINE_ANY option to pcre_exec()
                    or pcre_dfa_exec()

     Note that \xhh always specifies  one  byte,  even  in  UTF-8
     mode;  this  makes  it  possible  to construct invalid UTF-8
     sequences for testing purposes. On the other hand, \x{hh} is
     interpreted  as  a UTF-8 character in UTF-8 mode, generating
     more than one byte if the value is greater  than  127.  When
     not  in  UTF-8  mode,  it generates one byte for values less
     than 256, and causes an error for greater values.

     The escapes that specify line ending sequences  are  literal
     strings,  exactly as shown. No more than one newline setting
     should be present in any data line.

     A backslash followed by anything else just escapes the  any-
     thing else. If the very last character is a backslash, it is
     ignored. This gives a way of passing an empty line as  data,
     since a real empty line terminates the data input.

     The  \J  escape  provides a way of setting the maximum stack
     size that is used by the just-in-time optimization code.  It
     is  ignored if JIT optimization is not being used. Providing
     a stack that is larger than the  default  32K  is  necessary
     only for very complicated patterns.



SunOS 5.11                Last change:                         10






User Commands                                         PCRETEST(1)



     If  \M is present, pcretest calls pcre_exec() several times,
     with   different   values    in    the    match_limit    and
     match_limit_recursion  fields  of the pcre_extra data struc-
     ture, until it finds the minimum numbers for each  parameter
     that  allow  pcre_exec()  to complete without error. Because
     this is testing a specific feature of the  normal  interpre-
     tive  pcre_exec() execution, the use of any JIT optimization
     that might have been set up by  the  /S+  qualifier  of  -s+
     option is disabled.

     The  match_limit  number is a measure of the amount of back-
     tracking that takes  place,  and  checking  it  out  can  be
     instructive.  For  most  simple matches, the number is quite
     small, but for patterns with very large numbers of  matching
     possibilities,   it  can  become  large  very  quickly  with
     increasing length of subject string. The  match_limit_recur-
     sion  number  is a measure of how much stack (or, if PCRE is
     compiled with NO_RECURSE, how much heap) memory is needed to
     complete the match attempt.

     When  \O is used, the value specified may be higher or lower
     than the  size  set  by  the  -O  command  line  option  (or
     defaulted to 45); \O applies only to the call of pcre_exec()
     for the line in which it appears.

     If the /P modifier was present on the pattern,  causing  the
     POSIX  wrapper  API  to  be  used,  the  only option-setting
     sequences that have any effect are \B, \N, and  \Z,  causing
     REG_NOTBOL,  REG_NOTEMPTY,  and REG_NOTEOL, respectively, to
     be passed to regexec().

     The use of \x{hh...} to represent UTF-8  characters  is  not
     dependent  on  the use of the /8 modifier on the pattern. It
     is recognized always. There may be any number of hexadecimal
     digits  inside  the  braces.  The  result is from one to six
     bytes, encoded according to the original UTF-8 rules of  RFC
     2279.  This  allows for values in the range 0 to 0x7FFFFFFF.
     Note that not all of those are valid Unicode code points, or
     indeed  valid  UTF-8 characters according to the later rules
     in RFC 3629.

THE ALTERNATIVE MATCHING FUNCTION

     By default, pcretest uses the standard PCRE  matching  func-
     tion, pcre_exec() to match each data line. From release 6.0,
     PCRE   supports   an    alternative    matching    function,
     pcre_dfa_test(),  which operates in a different way, and has
     some restrictions. The differences between the two functions
     are described in the pcrematching documentation.

     If  a  data  line contains the \D escape sequence, or if the
     command line  contains  the  -dfa  option,  the  alternative



SunOS 5.11                Last change:                         11






User Commands                                         PCRETEST(1)



     matching function is called.  This function finds all possi-
     ble matches at a given point. If,  however,  the  \F  escape
     sequence  is  present  in  the data line, it stops after the
     first match is found. This is always the  shortest  possible
     match.

DEFAULT OUTPUT FROM PCRETEST

     This  section  describes the output when the normal matching
     function, pcre_exec(), is being used.

     When a match succeeds, pcretest outputs the list of captured
     substrings  that pcre_exec() returns, starting with number 0
     for the string that matched the whole pattern. Otherwise, it
     outputs  "No  match"  when the return is PCRE_ERROR_NOMATCH,
     and "Partial match:" followed by the partially matching sub-
     string  when  pcre_exec()  returns PCRE_ERROR_PARTIAL. (Note
     that this is the entire substring that was inspected  during
     the  partial  match;  it  may  include characters before the
     actual match start if a lookbehind assertion, \K, \b, or  \B
     was  involved.)  For  any other return, pcretest outputs the
     PCRE negative error number and a short  descriptive  phrase.
     If the error is a failed UTF-8 string check, the byte offset
     of the start of the failing character and  the  reason  code
     are also output, provided that the size of the output vector
     is at least two.  Here  is  an  example  of  an  interactive
     pcretest run.

       $ pcretest
       PCRE version 8.13 2011-04-30

         re> /^abc(\d+)/
       data> abc123
        0: abc123
        1: 123
       data> xyz
       No match

     Unset capturing substrings that are not followed by one that
     is set are not returned by pcre_exec(), and are not shown by
     pcretest.  In the following example, there are two capturing
     substrings, but when the first data  line  is  matched,  the
     second,  unset  substring  is not shown. An "internal" unset
     substring is shown as "<unset>",  as  for  the  second  data
     line.

         re> /(a)|(b)/
       data> a
        0: a
        1: a
       data> b
        0: b



SunOS 5.11                Last change:                         12






User Commands                                         PCRETEST(1)



        1: <unset>
        2: b

     If the strings contain any non-printing characters, they are
     output as \0x escapes, or as \x{...} escapes if the /8 modi-
     fier  was  present on the pattern. See below for the defini-
     tion of non-printing characters. If the pattern has  the  /+
     modifier,  the output for substring 0 is followed by the the
     rest of the subject string, identified by "0+" like this:

         re> /cat/+
       data> cataract
        0: cat
        0+ aract

     If the pattern has the /g or /G  modifier,  the  results  of
     successive  matching  attempts  are output in sequence, like
     this:

         re> /\Bi(\w\w)/g
       data> Mississippi
        0: iss
        1: ss
        0: iss
        1: ss
        0: ipp
        1: pp

     "No match" is output only if the first match attempt  fails.
     Here  is  an example of a failure message (the offset 4 that
     is specified by \>4 is past the end of the subject string):

         re> /xyz/
       data> xyz\>4
       Error -24 (bad offset value)

     If any of the sequences \C, \G, or \L are present in a  data
     line  that is successfully matched, the substrings extracted
     by the convenience functions are output  with  C,  G,  or  L
     after the string number instead of a colon. This is in addi-
     tion to the normal full list. The string  length  (that  is,
     the  return from the extraction function) is given in paren-
     theses after each string for \C and \G.

     Note that whereas patterns can  be  continued  over  several
     lines  (a  plain ">" prompt is used for continuations), data
     lines may not. However newlines can be included in  data  by
     means  of the \n escape (or \r, \r\n, etc., depending on the
     newline sequence setting).

OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION




SunOS 5.11                Last change:                         13






User Commands                                         PCRETEST(1)



     When the alternative matching function, pcre_dfa_exec(),  is
     used (by means of the \D escape sequence or the -dfa command
     line option), the output consists  of  a  list  of  all  the
     matches  that  start at the first point in the subject where
     there is at least one match. For example:

         re> /(tang|tangerine|tan)/
       data> yellow tangerine\D
        0: tangerine
        1: tang
        2: tan

     (Using the normal matching function on this data finds  only
     "tang".)  The  longest matching string is always given first
     (and numbered zero). After a PCRE_ERROR_PARTIAL return,  the
     output is "Partial match:", followed by the partially match-
     ing substring. (Note that this is the entire substring  that
     was inspected during the partial match; it may include char-
     acters before the actual match start if a lookbehind  asser-
     tion, \K, \b, or \B was involved.)

     If  /g  is  present  on  the pattern, the search for further
     matches resumes at the end of the longest match.  For  exam-
     ple:

         re> /(tang|tangerine|tan)/g
       data> yellow tangerine and tangy sultana\D
        0: tangerine
        1: tang
        2: tan
        0: tang
        1: tan
        0: tan

     Since  the matching function does not support substring cap-
     ture, the escape sequences that are concerned with  captured
     substrings are not relevant.

RESTARTING AFTER A PARTIAL MATCH

     When   the  alternative  matching  function  has  given  the
     PCRE_ERROR_PARTIAL return, indicating that the subject  par-
     tially  matched  the pattern, you can restart the match with
     additional subject data by means of the \R escape  sequence.
     For example:

         re>
     /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
       data> 23ja\P\D
       Partial match: 23ja
       data> n05\R\D
        0: n05



SunOS 5.11                Last change:                         14






User Commands                                         PCRETEST(1)



     For  further  information  about  partial  matching, see the
     pcrepartial documentation.

CALLOUTS

     If the pattern contains  any  callout  requests,  pcretest's
     callout  function is called during matching. This works with
     both matching functions. By  default,  the  called  function
     displays the callout number, the start and current positions
     in the text at the callout time, and the next  pattern  item
     to be tested. For example, the output

       --->pqrabcdef
         0    ^  ^     \d

     indicates that callout number 0 occurred for a match attempt
     starting at the fourth character of the subject string, when
     the  pointer  was  at the seventh character of the data, and
     when the next pattern item was \d. Just  one  circumflex  is
     output if the start and current positions are the same.

     Callouts  numbered 255 are assumed to be automatic callouts,
     inserted as a result of the /C  pattern  modifier.  In  this
     case,  instead  of showing the callout number, the offset in
     the pattern, preceded by a plus, is output. For example:

         re> /\d?[A-E]\*/C
       data> E*
       --->E*
        +0 ^      \d?
        +3 ^      [A-E]
        +8 ^^     \*
       +10 ^ ^
        0: E*

     If a pattern contains (*MARK) items, an additional  line  is
     output  whenever  a  change  of latest mark is passed to the
     callout function. For example:

         re> /a(*MARK:X)bc/C
       data> abc
       --->abc
        +0 ^       a
        +1 ^^      (*MARK:X)
       +10 ^^      b
       Latest Mark: X
       +11 ^ ^     c
       +12 ^  ^
        0: abc

     The mark changes between matching "a" and "b", but stays the
     same  for  the rest of the match, so nothing more is output.



SunOS 5.11                Last change:                         15






User Commands                                         PCRETEST(1)



     If, as a result of backtracking, the mark reverts  to  being
     unset, the text "<unset>" is output.

     The  callout  function  in  pcretest  returns zero (carry on
     matching) by default, but you can use a \C item  in  a  data
     line  (as  described above) to change this and other parame-
     ters of the callout.

     Inserting callouts can be helpful  when  using  pcretest  to
     check  complicated regular expressions. For further informa-
     tion about callouts, see the pcrecallout documentation.

NON-PRINTING CHARACTERS

     When pcretest is outputting text in the compiled version  of
     a  pattern,  bytes  other  than 32-126 are always treated as
     non-printing characters  are  are  therefore  shown  as  hex
     escapes.

     When pcretest is outputting text that is a matched part of a
     subject string, it behaves in the same way, unless a differ-
     ent  locale has been set for the pattern (using the /L modi-
     fier). In this case, the isprint() function  to  distinguish
     printing and non-printing characters.

SAVING AND RELOADING COMPILED PATTERNS

     The  facilities  described in this section are not available
     when the POSIX interface to PCRE is  being  used,  that  is,
     when the /P pattern modifier is specified.

     When  the  POSIX  interface  is  not  in  use, you can cause
     pcretest to write a compiled pattern to a file, by following
     the modifiers with > and a file name.  For example:

       /pattern/im >/some/file

     See  the pcreprecompile documentation for a discussion about
     saving and re-using compiled patterns.   Note  that  if  the
     pattern  was successfully studied with JIT optimization, the
     JIT data cannot be saved.

     The data that is written is binary. The  first  eight  bytes
     are  the length of the compiled pattern data followed by the
     length of the optional study  data,  each  written  as  four
     bytes  in big-endian order (most significant byte first). If
     there is no study data (either the pattern was not  studied,
     or  studying  did not return any data), the second length is
     zero. The lengths are followed by an exact copy of the  com-
     piled  pattern.  If  there  is  additional  study data, this
     (excluding any JIT data) follows immediately after the  com-
     piled  pattern.  After writing the file, pcretest expects to



SunOS 5.11                Last change:                         16






User Commands                                         PCRETEST(1)



     read a new pattern.

     A saved pattern can be reloaded into pcretest by  specifying
     < and a file name instead of a pattern. The name of the file
     must not contain a < character, as otherwise  pcretest  will
     interpret  the  line as a pattern delimited by < characters.
     For example:

        re> </some/file
       Compiled pattern loaded from /some/file
       No study data

     If the pattern was previously studied with the JIT optimiza-
     tion,  the JIT information cannot be saved and restored, and
     so is lost. When the pattern has been loaded, pcretest  pro-
     ceeds to read data lines in the usual way.

     You  can copy a file written by pcretest to a different host
     and reload it there, even if the new host has opposite endi-
     anness  to  the  one  on which the pattern was compiled. For
     example, you can compile on an i86  machine  and  run  on  a
     SPARC machine.

     File names for saving and reloading can be absolute or rela-
     tive, but note that the shell facility of expanding  a  file
     name that starts with a tilde (~) is not available.

     The ability to save and reload files in pcretest is intended
     for testing and experimentation. It is not intended for pro-
     duction  use because only a single pattern can be written to
     a file. Furthermore, there is no facility for supplying cus-
     tom character tables for use with a reloaded pattern. If the
     original pattern was compiled with custom tables, an attempt
     to match a subject string using a reloaded pattern is likely
     to cause pcretest to crash.  Finally, if you attempt to load
     a  file  that  is  not  in the correct format, the result is
     undefined.


ATTRIBUTES
     See  attributes(5)  for  descriptions   of   the   following
     attributes:

     +---------------+------------------+
     |ATTRIBUTE TYPE | ATTRIBUTE VALUE  |
     +---------------+------------------+
     |Availability   | library/pcre     |
     +---------------+------------------+
     |Stability      | Uncommitted      |
     +---------------+------------------+
SEE ALSO




SunOS 5.11                Last change:                         17






User Commands                                         PCRETEST(1)



     pcre(3),  pcreapi(3),  pcrecallout(3),  pcrejit,  pcrematch-
     ing(3), pcrepartial(d), pcrepattern(3), pcreprecompile(3).

AUTHOR

     Philip Hazel
     University Computing Service
     Cambridge CB2 3QH, England.

REVISION

     Last updated: 02 December 2011
     Copyright (c) 1997-2011 University of Cambridge.



NOTES
     This  software  was   built   from   source   available   at
     https://java.net/projects/solaris-userland.    The  original
     community  source  was   downloaded   from    http://source-
     forge.net/projects/pcre/files/pcre/8.21/pcre-8.21.tar.gz

     Further  information about this software can be found on the
     open source community website at http://pcre.org/.































SunOS 5.11                Last change:                         18