man pages section 1: User Commands

Exit Print View

Updated: July 2014
 
 

awk (1g)

Name

awk - pattern scanning and processing language

Synopsis

gawk  [  POSIX or GNU style options ] -f program-file [ -- ]
file ...
gawk [ POSIX or GNU style options ] [ -- ] program-text file
...

pgawk  [ POSIX or GNU style options ] -f program-file [ -- ]
file ...
pgawk [ POSIX or GNU style options ]  [  --  ]  program-text
file ...

Description




Utility Commands                                          GAWK(1)



NAME
     gawk - pattern scanning and processing language

SYNOPSIS
     gawk  [  POSIX or GNU style options ] -f program-file [ -- ]
     file ...
     gawk [ POSIX or GNU style options ] [ -- ] program-text file
     ...

     pgawk  [ POSIX or GNU style options ] -f program-file [ -- ]
     file ...
     pgawk [ POSIX or GNU style options ]  [  --  ]  program-text
     file ...

DESCRIPTION
     Gawk is the GNU Project's implementation of the AWK program-
     ming language.  It conforms to the definition  of  the  lan-
     guage in the POSIX 1003.1 Standard.  This version in turn is
     based on the description in The AWK Programming Language, by
     Aho, Kernighan, and Weinberger, with the additional features
     found in the System V Release 4 version of UNIX  awk.   Gawk
     also  provides more recent Bell Laboratories awk extensions,
     and a number of GNU-specific extensions.

     Pgawk is the profiling version of gawk.  It is identical  in
     every way to gawk, except that programs run more slowly, and
     it automatically produces an execution profile in  the  file
     awkprof.out when done.  See the --profile option, below.

     The command line consists of options to gawk itself, the AWK
     program text (if not supplied via the -f or --file options),
     and  values  to  be made available in the ARGC and ARGV pre-
     defined AWK variables.

OPTION FORMAT
     Gawk options may be  either  traditional  POSIX  one  letter
     options,  or  GNU-style  long  options.  POSIX options start
     with a single "-", while long options start with "--".  Long
     options  are provided for both GNU-specific features and for
     POSIX-mandated features.

     Following the POSIX standard, gawk-specific options are sup-
     plied  via  arguments to the -W option.  Multiple -W options
     may be supplied Each -W  option  has  a  corresponding  long
     option,  as  detailed  below.  Arguments to long options are
     either joined with the option by an = sign, with  no  inter-
     vening  spaces,  or they may be provided in the next command
     line argument.  Long options may be abbreviated, as long  as
     the abbreviation remains unique.

OPTIONS




Free Software FoundatLast change: Apr 20 2010                   1






Utility Commands                                          GAWK(1)



     Gawk accepts the following options, listed by frequency.

     -F fs
     --field-separator fs
          Use  fs for the input field separator (the value of the
          FS predefined variable).

     -v var=val
     --assign var=val
          Assign the value val to the variable var, before execu-
          tion  of  the program begins.  Such variable values are
          available to the BEGIN block of an AWK program.

     -f program-file
     --file program-file
          Read the AWK program source from the file program-file,
          instead  of from the first command line argument.  Mul-
          tiple -f (or --file) options may be used.

     -mf NNN
     -mr NNN
          Set various memory limits to the value NNN.  The f flag
          sets  the maximum number of fields, and the r flag sets
          the maximum record size.  These two flags  and  the  -m
          option  are from an earlier version of the Bell Labora-
          tories research version of UNIX awk.  They are  ignored
          by  gawk,  since gawk has no pre-defined limits.  (Cur-
          rent versions of the Bell Laboratories  awk  no  longer
          accept them.)

     -O
     --optimize
          Enable  optimizations  upon the internal representation
          of the program.  Currently, this includes  just  simple
          constant-folding.  The  gawk  maintainer  hopes  to add
          additional optimizations over time.

     -W compat
     -W traditional
     --compat
     --traditional
          Run in compatibility mode.  In compatibility mode, gawk
          behaves  identically  to UNIX awk; none of the GNU-spe-
          cific extensions are recognized.  The use  of  --tradi-
          tional  is  preferred  over  the  other  forms  of this
          option.  See GNU EXTENSIONS, below, for  more  informa-
          tion.

     -W copyleft
     -W copyright
     --copyleft
     --copyright



Free Software FoundatLast change: Apr 20 2010                   2






Utility Commands                                          GAWK(1)



          Print  the  short version of the GNU copyright informa-
          tion message on the standard output and  exit  success-
          fully.

     -W dump-variables[=file]
     --dump-variables[=file]
          Print  a  sorted  list of global variables, their types
          and final values to file.  If no file is provided, gawk
          uses a file named awkvars.out in the current directory.
          Having a list of all the global variables is a good way
          to look for typographical errors in your programs.  You
          would also use this option if you have a large  program
          with  a  lot of functions, and you want to be sure that
          your functions don't inadvertently use global variables
          that  you  meant  to be local.  (This is a particularly
          easy mistake to make with simple variable names like i,
          j, and so on.)

     -W exec file
     --exec file
          Similar  to -f, however, this is option is the last one
          processed.  This should be used with #!  scripts,  par-
          ticularly  for  CGI  applications,  to avoid passing in
          options or source code (!) on the command line  from  a
          URL.    This   option  disables  command-line  variable
          assignments.

     -W gen-po
     --gen-po
          Scan and parse the AWK program, and generate a GNU  .po
          format  file  on  standard  output with entries for all
          localizable strings in the program.  The program itself
          is  not executed.  See the GNU gettext distribution for
          more information on .po files.

     -W help
     -W usage
     --help
     --usage
          Print a  relatively  short  summary  of  the  available
          options  on  the  standard output.  (Per the GNU Coding
          Standards, these options cause an immediate, successful
          exit.)

     -W lint[=value]
     --lint[=value]
          Provide  warnings  about constructs that are dubious or
          non-portable to other  AWK  implementations.   With  an
          optional  argument of fatal, lint warnings become fatal
          errors.  This may be drastic, but  its  use  will  cer-
          tainly  encourage  the  development of cleaner AWK pro-
          grams.  With an  optional  argument  of  invalid,  only



Free Software FoundatLast change: Apr 20 2010                   3






Utility Commands                                          GAWK(1)



          warnings  about  things  that  are actually invalid are
          issued. (This is not fully implemented yet.)

     -W lint-old
     --lint-old
          Provide warnings about constructs that are not portable
          to the original version of Unix awk.

     -W non-decimal-data
     --non-decimal-data
          Recognize  octal  and hexadecimal values in input data.
          Use this option with great caution!

     -W posix
     --posix
          This turns on compatibility mode,  with  the  following
          additional restrictions:

          o \x escape sequences are not recognized.

          o Only space and tab act as field separators when FS is
            set to a single space, newline does not.

          o You cannot continue lines after ?  and :.

          o The synonym func for the keyword function is not rec-
            ognized.

          o The operators ** and **= cannot be used in place of ^
            and ^=.

          o The fflush() function is not available.

     -W profile[=prof_file]
     --profile[=prof_file]
          Send profiling  data  to  prof_file.   The  default  is
          awkprof.out.  When run with gawk, the profile is just a
          "pretty printed" version of the program.  When run with
          pgawk,  the  profile  contains execution counts of each
          statement in the program in the left margin  and  func-
          tion call counts for each user-defined function.

     -W re-interval
     --re-interval
          Enable  the  use  of  interval  expressions  in regular
          expression matching (see Regular  Expressions,  below).
          Interval  expressions  were not traditionally available
          in the AWK language.  The POSIX standard added them, to
          make  awk  and  egrep consistent with each other.  How-
          ever, their use is likely to break old AWK programs, so
          gawk only provides them if they are requested with this
          option, or when --posix is specified.



Free Software FoundatLast change: Apr 20 2010                   4






Utility Commands                                          GAWK(1)



     -W source program-text
     --source program-text
          Use program-text as  AWK  program  source  code.   This
          option allows the easy intermixing of library functions
          (used via the -f and --file options) with  source  code
          entered  on the command line.  It is intended primarily
          for medium to large AWK programs used in shell scripts.

     -W use-lc-numeric
     --use-lc-numeric
          This  forces  gawk  to  use  the locale's decimal point
          character when parsing input data.  Although the  POSIX
          standard  requires this behavior, and gawk does so when
          --posix is in effect, the default is to  follow  tradi-
          tional  behavior and use a period as the decimal point,
          even in locales where the period  is  not  the  decimal
          point  character.   This  option  overrides the default
          behavior, without the full draconian strictness of  the
          --posix option.

     -W version
     --version
          Print  version  information for this particular copy of
          gawk on the standard output.  This is useful mainly for
          knowing  if  the current copy of gawk on your system is
          up to date with respect to whatever the  Free  Software
          Foundation  is  distributing.  This is also useful when
          reporting bugs.  (Per the GNU Coding  Standards,  these
          options cause an immediate, successful exit.)

     --   Signal the end of options. This is useful to allow fur-
          ther arguments to the AWK program itself to start  with
          a  "-".   This  provides  consistency with the argument
          parsing convention used by most other POSIX programs.

     In compatibility mode, any  other  options  are  flagged  as
     invalid, but are otherwise ignored.  In normal operation, as
     long as program text has been supplied, unknown options  are
     passed  on to the AWK program in the ARGV array for process-
     ing.  This is particularly useful for running  AWK  programs
     via the "#!" executable interpreter mechanism.

AWK PROGRAM EXECUTION
     An  AWK  program  consists  of  a sequence of pattern-action
     statements and optional function definitions.

          pattern   { action statements }
          function name(parameter list) { statements }

     Gawk first reads the program source from the program-file(s)
     if  specified, from arguments to --source, or from the first
     non-option  argument  on  the  command  line.   The  -f  and



Free Software FoundatLast change: Apr 20 2010                   5






Utility Commands                                          GAWK(1)



     --source  options  may be used multiple times on the command
     line.  Gawk reads the program text as if  all  the  program-
     files  and  command  line source texts had been concatenated
     together.  This is useful  for  building  libraries  of  AWK
     functions,  without  having  to include them in each new AWK
     program that uses them.  It also provides the ability to mix
     library functions with command line programs.

     The  environment variable AWKPATH specifies a search path to
     use when finding source files named with the -f option.   If
     this   variable   does   not  exist,  the  default  path  is
     ".:/usr/local/share/awk".  (The actual directory  may  vary,
     depending upon how gawk was built and installed.)  If a file
     name given to the -f option contains  a  "/"  character,  no
     path search is performed.

     Gawk  executes  AWK programs in the following order.  First,
     all variable assignments specified via  the  -v  option  are
     performed.  Next, gawk compiles the program into an internal
     form.  Then, gawk executes the code in  the  BEGIN  block(s)
     (if  any),  and then proceeds to read each file named in the
     ARGV array.  If there are no  files  named  on  the  command
     line, gawk reads the standard input.

     If a filename on the command line has the form var=val it is
     treated as a variable assignment.  The variable var will  be
     assigned  the  value  val.   (This  happens  after any BEGIN
     block(s) have been run.)  Command line  variable  assignment
     is most useful for dynamically assigning values to the vari-
     ables AWK uses to control how input is  broken  into  fields
     and  records.   It  is  also useful for controlling state if
     multiple passes are needed over a single data file.

     If the value of a particular element of ARGV is empty  (""),
     gawk skips over it.

     For  each  record  in  the  input,  gawk  tests to see if it
     matches any pattern in the AWK program.   For  each  pattern
     that  the record matches, the associated action is executed.
     The patterns are tested in the order they occur in the  pro-
     gram.

     Finally, after all the input is exhausted, gawk executes the
     code in the END block(s) (if any).

VARIABLES, RECORDS AND FIELDS
     AWK variables are dynamic; they  come  into  existence  when
     they are first used.  Their values are either floating-point
     numbers or strings, or both, depending  upon  how  they  are
     used.  AWK also has one dimensional arrays; arrays with mul-
     tiple dimensions  may  be  simulated.   Several  pre-defined
     variables  are set as a program runs; these are described as



Free Software FoundatLast change: Apr 20 2010                   6






Utility Commands                                          GAWK(1)



     needed and summarized below.

  Records
     Normally, records are separated by newline characters.   You
     can control how records are separated by assigning values to
     the built-in variable RS.  If RS is  any  single  character,
     that  character separates records.  Otherwise, RS is a regu-
     lar expression.  Text in the input that matches this regular
     expression  separates the record.  However, in compatibility
     mode, only the first character of its string value  is  used
     for  separating  records.   If RS is set to the null string,
     then records are separated by blank lines.  When RS  is  set
     to  the  null string, the newline character always acts as a
     field separator, in addition to whatever value FS may  have.

  Fields
     As  each  input  record is read, gawk splits the record into
     fields, using the value of the FS variable as the field sep-
     arator.   If  FS is a single character, fields are separated
     by that character.  If FS is  the  null  string,  then  each
     individual  character  becomes a separate field.  Otherwise,
     FS is expected to be a full regular expression.  In the spe-
     cial case that FS is a single space, fields are separated by
     runs of spaces and/or tabs and/or newlines.   (But  see  the
     section  POSIX  COMPATIBILITY,  below).   NOTE: The value of
     IGNORECASE (see below) also affects  how  fields  are  split
     when  FS  is a regular expression, and how records are sepa-
     rated when RS is a regular expression.

     If the FIELDWIDTHS variable is set to a space separated list
     of  numbers, each field is expected to have fixed width, and
     gawk splits up the record using the specified  widths.   The
     value  of  FS is ignored.  Assigning a new value to FS over-
     rides the use  of  FIELDWIDTHS,  and  restores  the  default
     behavior.

     Each  field  in  the  input  record may be referenced by its
     position, $1, $2, and  so  on.   $0  is  the  whole  record.
     Fields need not be referenced by constants:

          n = 5
          print $n

     prints the fifth field in the input record.

     The  variable NF is set to the total number of fields in the
     input record.

     References to non-existent fields (i.e.  fields  after  $NF)
     produce  the null-string.  However, assigning to a non-exis-
     tent field (e.g., $(NF+2) = 5) increases the  value  of  NF,
     creates any intervening fields with the null string as their



Free Software FoundatLast change: Apr 20 2010                   7






Utility Commands                                          GAWK(1)



     value, and causes the value of $0 to be recomputed, with the
     fields  being  separated by the value of OFS.  References to
     negative numbered fields cause a fatal error.   Decrementing
     NF  causes  the  values  of  fields past the new value to be
     lost, and the value of $0 to be recomputed, with the  fields
     being separated by the value of OFS.

     Assigning  a  value  to  an  existing field causes the whole
     record to be rebuilt  when  $0  is  referenced.   Similarly,
     assigning  a  value  to  $0 causes the record to be resplit,
     creating new values for the fields.

  Built-in Variables
     Gawk's built-in variables are:

     ARGC        The number of command line arguments  (does  not
                 include options to gawk, or the program source).

     ARGIND      The index in ARGV of the current file being pro-
                 cessed.

     ARGV        Array  of  command line arguments.  The array is
                 indexed from 0 to ARGC - 1.  Dynamically  chang-
                 ing  the  contents of ARGV can control the files
                 used for data.

     BINMODE     On non-POSIX systems, specifies use of  "binary"
                 mode  for all file I/O.  Numeric values of 1, 2,
                 or 3, specify that input files, output files, or
                 all  files, respectively, should use binary I/O.
                 String values of "r", or "w" specify that  input
                 files, or output files, respectively, should use
                 binary I/O.  String values of "rw" or "wr" spec-
                 ify  that  all files should use binary I/O.  Any
                 other string value is treated as "rw", but  gen-
                 erates a warning message.

     CONVFMT     The  conversion  format  for numbers, "%.6g", by
                 default.

     ENVIRON     An array containing the values  of  the  current
                 environment.   The array is indexed by the envi-
                 ronment variables, each element being the  value
                 of that variable (e.g., ENVIRON["HOME"] might be
                 /home/arnold).  Changing  this  array  does  not
                 affect  the  environment  seen by programs which
                 gawk spawns  via  redirection  or  the  system()
                 function.

     ERRNO       If  a  system  error occurs either doing a redi-
                 rection for getline, during a read for  getline,
                 or  during  a close(), then ERRNO will contain a



Free Software FoundatLast change: Apr 20 2010                   8






Utility Commands                                          GAWK(1)



                 string describing the error.  The value is  sub-
                 ject to translation in non-English locales.

     FIELDWIDTHS A  white-space  separated  list  of fieldwidths.
                 When set, gawk parses the input into  fields  of
                 fixed  width,  instead of using the value of the
                 FS variable as the field separator.

     FILENAME    The name of the current input file.  If no files
                 are  specified on the command line, the value of
                 FILENAME is "-".  However, FILENAME is undefined
                 inside  the BEGIN block (unless set by getline).

     FNR         The input record number  in  the  current  input
                 file.

     FS          The  input  field separator, a space by default.
                 See Fields, above.

     IGNORECASE  Controls the  case-sensitivity  of  all  regular
                 expression and string operations.  If IGNORECASE
                 has a non-zero value,  then  string  comparisons
                 and  pattern  matching in rules, field splitting
                 with FS,  record  separating  with  RS,  regular
                 expression  matching with ~ and !~, and the gen-
                 sub(), gsub(), index(),  match(),  split(),  and
                 sub()  built-in  functions  all ignore case when
                 doing  regular  expression  operations.    NOTE:
                 Array  subscripting  is  not affected.  However,
                 the asort() and asorti() functions are affected.
                 Thus,  if  IGNORECASE is not equal to zero, /aB/
                 matches all of the strings "ab", "aB", "Ab", and
                 "AB".   As  with  all AWK variables, the initial
                 value of IGNORECASE  is  zero,  so  all  regular
                 expression  and  string  operations are normally
                 case-sensitive.  Under Unix, the full ISO 8859-1
                 Latin-1  character  set  is  used  when ignoring
                 case.  As of gawk 3.1.4, the case  equivalencies
                 are fully locale-aware, based on the C <ctype.h>
                 facilities such as isalpha(), and toupper().

     LINT        Provides dynamic control of  the  --lint  option
                 from  within  an  AWK  program.  When true, gawk
                 prints lint warnings. When false, it  does  not.
                 When  assigned  the  string  value "fatal", lint
                 warnings  become  fatal  errors,  exactly   like
                 --lint=fatal.   Any other true value just prints
                 warnings.

     NF          The  number  of  fields  in  the  current  input
                 record.




Free Software FoundatLast change: Apr 20 2010                   9






Utility Commands                                          GAWK(1)



     NR          The total number of input records seen so far.

     OFMT        The   output  format  for  numbers,  "%.6g",  by
                 default.

     OFS         The output field separator, a space by  default.

     ORS         The  output  record separator, by default a new-
                 line.

     PROCINFO    The elements of this  array  provide  access  to
                 information  about  the running AWK program.  On
                 some systems,  there  may  be  elements  in  the
                 array,  "group1"  through  "groupn"  for some n,
                 which is the number of supplementary groups that
                 the  process  has.   Use the in operator to test
                 for these elements.  The following elements  are
                 guaranteed to be available:

                 PROCINFO["egid"]    the  value of the getegid(2)
                                     system call.

                 PROCINFO["euid"]    the value of the  geteuid(2)
                                     system call.

                 PROCINFO["FS"]      "FS" if field splitting with
                                     FS is in effect, or  "FIELD-
                                     WIDTHS"  if  field splitting
                                     with   FIELDWIDTHS   is   in
                                     effect.

                 PROCINFO["gid"]     the  value  of the getgid(2)
                                     system call.

                 PROCINFO["pgrpid"]  the process group ID of  the
                                     current process.

                 PROCINFO["pid"]     the  process  ID of the cur-
                                     rent process.

                 PROCINFO["ppid"]    the parent process ID of the
                                     current process.

                 PROCINFO["uid"]     the  value  of the getuid(2)
                                     system call.

                 PROCINFO["version"] the version of  gawk.   This
                                     is  available  from  version
                                     3.1.4 and later.

     RS          The input record separator, by  default  a  new-
                 line.



Free Software FoundatLast change: Apr 20 2010                  10






Utility Commands                                          GAWK(1)



     RT          The  record  terminator.   Gawk  sets  RT to the
                 input text that matched the character or regular
                 expression specified by RS.

     RSTART      The  index  of  the  first  character matched by
                 match(); 0 if  no  match.   (This  implies  that
                 character indices start at one.)

     RLENGTH     The  length of the string matched by match(); -1
                 if no match.

     SUBSEP      The character used  to  separate  multiple  sub-
                 scripts in array elements, by default "\034".

     TEXTDOMAIN  The text domain of the AWK program; used to find
                 the localized  translations  for  the  program's
                 strings.

  Arrays
     Arrays  are  subscripted  with  an expression between square
     brackets ([ and ]).  If the expression is an expression list
     (expr,  expr ...)  then the array subscript is a string con-
     sisting of the concatenation of the (string) value  of  each
     expression,  separated  by the value of the SUBSEP variable.
     This facility  is  used  to  simulate  multiply  dimensioned
     arrays.  For example:

          i = "A"; j = "B"; k = "C"
          x[i, j, k] = "hello, world\n"

     assigns  the  string  "hello, world\n" to the element of the
     array x which is indexed by the string  "A\034B\034C".   All
     arrays  in  AWK are associative, i.e. indexed by string val-
     ues.

     The special operator in may be used to test if an array  has
     an index consisting of a particular value.

          if (val in array)
               print array[val]

     If the array has multiple subscripts, use (i, j) in array.

     The  in  construct may also be used in a for loop to iterate
     over all the elements of an array.

     An element may be deleted from an  array  using  the  delete
     statement.   The delete statement may also be used to delete
     the entire contents of an  array,  just  by  specifying  the
     array name without a subscript.





Free Software FoundatLast change: Apr 20 2010                  11






Utility Commands                                          GAWK(1)



  Variable Typing And Conversion
     Variables  and  fields  may  be (floating point) numbers, or
     strings, or both.  How the value of  a  variable  is  inter-
     preted  depends  upon  its  context.   If  used in a numeric
     expression, it will be treated as a number;  if  used  as  a
     string it will be treated as a string.

     To  force a variable to be treated as a number, add 0 to it;
     to force it to be treated as a string, concatenate  it  with
     the null string.

     When  a string must be converted to a number, the conversion
     is accomplished using strtod(3).  A number is converted to a
     string  by using the value of CONVFMT as a format string for
     sprintf(3), with the numeric value of the  variable  as  the
     argument.   However,  even  though  all  numbers  in AWK are
     floating-point, integral  values  are  always  converted  as
     integers.  Thus, given

          CONVFMT = "%2.2f"
          a = 12
          b = a ""

     the variable b has a string value of "12" and not "12.00".

     When  operating in POSIX mode (such as with the --posix com-
     mand line option), beware that locale settings may interfere
     with  the way decimal numbers are treated: the decimal sepa-
     rator of the numbers you are feeding to gawk must conform to
     what your locale would expect, be it a comma (,) or a period
     (.).

     Gawk performs comparisons as follows: If two  variables  are
     numeric,  they  are  compared  numerically.  If one value is
     numeric and the other has a string value that is a  "numeric
     string," then comparisons are also done numerically.  Other-
     wise, the numeric value is  converted  to  a  string  and  a
     string  comparison  is performed.  Two strings are compared,
     of course, as strings.

     Note that string constants, such as "57",  are  not  numeric
     strings,  they  are  string constants.  The idea of "numeric
     string" only applies to  fields,  getline  input,  FILENAME,
     ARGV elements, ENVIRON elements and the elements of an array
     created by split() that are numeric strings.  The basic idea
     is that user input, and only user input, that looks numeric,
     should be treated that way.

     Uninitialized variables have the numeric  value  0  and  the
     string value "" (the null, or empty, string).





Free Software FoundatLast change: Apr 20 2010                  12






Utility Commands                                          GAWK(1)



  Octal and Hexadecimal Constants
     Starting  with  version  3.1  of  gawk , you may use C-style
     octal and hexadecimal constants in your AWK  program  source
     code.   For example, the octal value 011 is equal to decimal
     9, and the hexadecimal value 0x11 is equal to decimal 17.

  String Constants
     String constants in AWK are sequences of characters enclosed
     between  double  quotes (").  Within strings, certain escape
     sequences are recognized, as in C.  These are:

     \\   A literal backslash.

     \a   The "alert" character; usually the ASCII BEL character.

     \b   backspace.

     \f   form-feed.

     \n   newline.

     \r   carriage return.

     \t   horizontal tab.

     \v   vertical tab.

     \xhex digits
          The  character represented by the string of hexadecimal
          digits following the \x.  As in ANSI C,  all  following
          hexadecimal  digits  are  considered part of the escape
          sequence.  (This feature should tell us something about
          language  design  by  committee.)   E.g., "\x1B" is the
          ASCII ESC (escape) character.

     \ddd The character represented by the  1-,  2-,  or  3-digit
          sequence  of  octal  digits.  E.g., "\033" is the ASCII
          ESC (escape) character.

     \c   The literal character c.

     The escape sequences may also be used inside constant  regu-
     lar  expressions  (e.g.,  /[ \t\f\n\r\v]/ matches whitespace
     characters).

     In compatibility mode, the characters represented  by  octal
     and  hexadecimal escape sequences are treated literally when
     used in regular  expression  constants.   Thus,  /a\52b/  is
     equivalent to /a\*b/.

PATTERNS AND ACTIONS
     AWK  is  a line-oriented language.  The pattern comes first,



Free Software FoundatLast change: Apr 20 2010                  13






Utility Commands                                          GAWK(1)



     and then the action.  Action statements are  enclosed  in  {
     and }.  Either the pattern may be missing, or the action may
     be missing, but, of course, not both.   If  the  pattern  is
     missing,  the  action is executed for every single record of
     input.  A missing action is equivalent to

          { print }

     which prints the entire record.

     Comments begin with the "#" character,  and  continue  until
     the  end  of  the line.  Blank lines may be used to separate
     statements.  Normally, a statement ends with a newline, how-
     ever,  this is not the case for lines ending in a ",", {, ?,
     :, &&, or ||.  Lines ending in do or else  also  have  their
     statements  automatically  continued  on the following line.
     In other cases, a line can be continued by ending it with  a
     "\", in which case the newline will be ignored.

     Multiple  statements  may  be  put on one line by separating
     them with a ";".  This applies to both the statements within
     the  action  part of a pattern-action pair (the usual case),
     and to the pattern-action statements themselves.

  Patterns
     AWK patterns may be one of the following:

          BEGIN
          END
          /regular expression/
          relational expression
          pattern && pattern
          pattern || pattern
          pattern ? pattern : pattern
          (pattern)
          ! pattern
          pattern1, pattern2

     BEGIN and END are two special kinds of  patterns  which  are
     not tested against the input.  The action parts of all BEGIN
     patterns are merged as if all the statements had been  writ-
     ten  in  a single BEGIN block.  They are executed before any
     of the input is read.  Similarly, all  the  END  blocks  are
     merged,  and  executed  when  all the input is exhausted (or
     when an exit statement is executed).  BEGIN and END patterns
     cannot  be  combined  with other patterns in pattern expres-
     sions.  BEGIN and END patterns cannot  have  missing  action
     parts.

     For  /regular expression/ patterns, the associated statement
     is executed for each input record that matches  the  regular
     expression.   Regular  expressions  are the same as those in



Free Software FoundatLast change: Apr 20 2010                  14






Utility Commands                                          GAWK(1)



     egrep(1), and are summarized below.

     A relational expression may use any of the operators defined
     below  in  the  section  on  actions.   These generally test
     whether certain fields match certain regular expressions.

     The &&, ||, and !  operators are logical  AND,  logical  OR,
     and  logical NOT, respectively, as in C.  They do short-cir-
     cuit evaluation, also as in C, and are  used  for  combining
     more  primitive  pattern expressions.  As in most languages,
     parentheses may be used to change the order of evaluation.

     The ?: operator is like the same  operator  in  C.   If  the
     first  pattern  is true then the pattern used for testing is
     the second pattern, otherwise it is the third.  Only one  of
     the second and third patterns is evaluated.

     The  pattern1,  pattern2  form  of an expression is called a
     range pattern.  It matches all input records starting with a
     record  that matches pattern1, and continuing until a record
     that matches pattern2, inclusive.  It does not combine  with
     any other sort of pattern expression.

  Regular Expressions
     Regular  expressions  are  the extended kind found in egrep.
     They are composed of characters as follows:

     c          matches the non-metacharacter c.

     \c         matches the literal character c.

     .          matches any character including newline.

     ^          matches the beginning of a string.

     $          matches the end of a string.

     [abc...]   character list, matches  any  of  the  characters
                abc....

     [^abc...]  negated  character  list,  matches  any character
                except abc....

     r1|r2      alternation: matches either r1 or r2.

     r1r2       concatenation: matches r1, and then r2.

     r+         matches one or more r's.

     r*         matches zero or more r's.

     r?         matches zero or one r's.



Free Software FoundatLast change: Apr 20 2010                  15






Utility Commands                                          GAWK(1)



     (r)        grouping: matches r.

     r{n}
     r{n,}
     r{n,m}     One or two numbers inside braces denote an inter-
                val  expression.   If  there is one number in the
                braces, the preceding  regular  expression  r  is
                repeated n times.  If there are two numbers sepa-
                rated by a comma, r is repeated n to m times.  If
                there  is  one number followed by a comma, then r
                is repeated at least n times.
                Interval expressions are only available if either
                --posix or --re-interval is specified on the com-
                mand line.

     \y         matches the empty string at either the  beginning
                or the end of a word.

     \B         matches the empty string within a word.

     \<         matches  the  empty  string at the beginning of a
                word.

     \>         matches the empty string at the end of a word.

     \w         matches any word-constituent  character  (letter,
                digit, or underscore).

     \W         matches  any  character  that  is  not  word-con-
                stituent.

     \`         matches the empty string at the  beginning  of  a
                buffer (string).

     \'         matches  the empty string at the end of a buffer.

     The escape sequences that are valid in string constants (see
     below) are also valid in regular expressions.

     Character  classes  are  a  feature  introduced in the POSIX
     standard.  A character  class  is  a  special  notation  for
     describing   lists   of  characters  that  have  a  specific
     attribute, but where the actual  characters  themselves  can
     vary  from  country  to country and/or from character set to
     character set.  For example, the notion of what is an alpha-
     betic character differs in the USA and in France.

     A  character  class  is  only  valid in a regular expression
     inside the brackets of a character list.  Character  classes
     consist  of  [:,  a keyword denoting the class, and :].  The
     character classes defined by the POSIX standard are:




Free Software FoundatLast change: Apr 20 2010                  16






Utility Commands                                          GAWK(1)



     [:alnum:]  Alphanumeric characters.

     [:alpha:]  Alphabetic characters.

     [:blank:]  Space or tab characters.

     [:cntrl:]  Control characters.

     [:digit:]  Numeric characters.

     [:graph:]  Characters that are both printable  and  visible.
                (A  space is printable, but not visible, while an
                a is both.)

     [:lower:]  Lower-case alphabetic characters.

     [:print:]  Printable characters  (characters  that  are  not
                control characters.)

     [:punct:]  Punctuation  characters  (characters that are not
                letter,  digits,  control  characters,  or  space
                characters).

     [:space:]  Space  characters  (such as space, tab, and form-
                feed, to name a few).

     [:upper:]  Upper-case alphabetic characters.

     [:xdigit:] Characters that are hexadecimal digits.

     For example, before the POSIX standard,  to  match  alphanu-
     meric characters, you would have had to write /[A-Za-z0-9]/.
     If your character set had other alphabetic characters in it,
     this  would  not  match them, and if your character set col-
     lated differently from ASCII, this might not even match  the
     ASCII  alphanumeric  characters.   With  the POSIX character
     classes, you can write /[[:alnum:]]/, and this  matches  the
     alphabetic  and numeric characters in your character set, no
     matter what it is.

     Two additional special sequences  can  appear  in  character
     lists.   These  apply to non-ASCII character sets, which can
     have single symbols (called  collating  elements)  that  are
     represented with more than one character, as well as several
     characters that are equivalent for  collating,  or  sorting,
     purposes.   (E.g.,  in  French,  a  plain  "e"  and a grave-
     accented "`" are equivalent.)

     Collating Symbols
          A collating symbol is a multi-character collating  ele-
          ment  enclosed  in [.  and .].  For example, if ch is a
          collating  element,  then  [[.ch.]]    is   a   regular



Free Software FoundatLast change: Apr 20 2010                  17






Utility Commands                                          GAWK(1)



          expression  that  matches this collating element, while
          [ch] is a regular expression that matches either  c  or
          h.

     Equivalence Classes
          An  equivalence  class  is a locale-specific name for a
          list of characters that are equivalent.   The  name  is
          enclosed  in  [= and =].  For example, the name e might
          be used to represent all of "e," "'," and "`."  In this
          case,  [[=e=]] is a regular expression that matches any
          of e, ', or `.

     These features are very  valuable  in  non-English  speaking
     locales.   The  library functions that gawk uses for regular
     expression matching currently only recognize POSIX character
     classes;  they do not recognize collating symbols or equiva-
     lence classes.

     The \y, \B, \<, \>, \w, \W, \`, and \'  operators  are  spe-
     cific  to  gawk;  they are extensions based on facilities in
     the GNU regular expression libraries.

     The various command line options control how gawk interprets
     characters in regular expressions.

     No options
          In the default case, gawk provide all the facilities of
          POSIX regular expressions and the GNU  regular  expres-
          sion  operators  described  above.   However,  interval
          expressions are not supported.

     --posix
          Only POSIX regular expressions are supported,  the  GNU
          operators are not special.  (E.g., \w matches a literal
          w).  Interval expressions are allowed.

     --traditional
          Traditional Unix awk regular expressions  are  matched.
          The GNU operators are not special, interval expressions
          are not available, and neither are the POSIX  character
          classes  ([[:alnum:]] and so on).  Characters described
          by octal and hexadecimal escape sequences  are  treated
          literally,  even  if  they represent regular expression
          metacharacters.

     --re-interval
          Allow interval expressions in regular expressions, even
          if --traditional has been provided.

  Actions
     Action  statements  are enclosed in braces, { and }.  Action
     statements consist of the usual assignment, conditional, and



Free Software FoundatLast change: Apr 20 2010                  18






Utility Commands                                          GAWK(1)



     looping  statements found in most languages.  The operators,
     control statements, and  input/output  statements  available
     are patterned after those in C.

  Operators
     The operators in AWK, in order of decreasing precedence, are

     (...)       Grouping

     $           Field reference.

     ++ --       Increment and decrement, both prefix  and  post-
                 fix.

     ^           Exponentiation (** may also be used, and **= for
                 the assignment operator).

     + - !       Unary plus, unary minus, and logical negation.

     * / %       Multiplication, division, and modulus.

     + -         Addition and subtraction.

     space       String concatenation.

     | |&        Piped I/O for getline, print, and printf.

     < > <= >= != ==
                 The regular relational operators.

     ~ !~        Regular expression match, negated match.   NOTE:
                 Do not use a constant regular expression (/foo/)
                 on the left-hand side of a ~ or  !~.   Only  use
                 one  on  the  right-hand  side.   The expression
                 /foo/ ~ exp has  the  same  meaning  as  (($0  ~
                 /foo/)  ~  exp).   This  is usually not what was
                 intended.

     in          Array membership.

     &&          Logical AND.

     ||          Logical OR.

     ?:          The C conditional expression.  This has the form
                 expr1  ?  expr2  : expr3.  If expr1 is true, the
                 value of the expression is expr2,  otherwise  it
                 is expr3.  Only one of expr2 and expr3 is evalu-
                 ated.

     = += -= *= /= %= ^=
                 Assignment.  Both  absolute  assignment  (var  =



Free Software FoundatLast change: Apr 20 2010                  19






Utility Commands                                          GAWK(1)



                 value) and operator-assignment (the other forms)
                 are supported.

  Control Statements
     The control statements are as follows:

          if (condition) statement [ else statement ]
          while (condition) statement
          do statement while (condition)
          for (expr1; expr2; expr3) statement
          for (var in array) statement
          break
          continue
          delete array[index]
          delete array
          exit [ expression ]
          { statements }

  I/O Statements
     The input/output statements are as follows:

     close(file [, how])   Close file, pipe or  co-process.   The
                           optional  how should only be used when
                           closing one end of a two-way pipe to a
                           co-process.    It  must  be  a  string
                           value, either "to" or "from".

     getline               Set $0 from next input record; set NF,
                           NR, FNR.

     getline <file         Set  $0  from next record of file; set
                           NF.

     getline var           Set var from next  input  record;  set
                           NR, FNR.

     getline var <file     Set var from next record of file.

     command | getline [var]
                           Run  command  piping the output either
                           into $0 or var, as above.

     command |& getline [var]
                           Run command as a co-process piping the
                           output  either  into  $0  or  var,  as
                           above.  Co-processes are a gawk exten-
                           sion.   (command can also be a socket.
                           See the subsection Special File Names,
                           below.)

     next                  Stop   processing  the  current  input
                           record.  The next input record is read



Free Software FoundatLast change: Apr 20 2010                  20






Utility Commands                                          GAWK(1)



                           and  processing  starts  over with the
                           first pattern in the AWK program.   If
                           the  end of the input data is reached,
                           the END block(s),  if  any,  are  exe-
                           cuted.

     nextfile              Stop   processing  the  current  input
                           file.   The  next  input  record  read
                           comes from the next input file.  FILE-
                           NAME and ARGIND are  updated,  FNR  is
                           reset to 1, and processing starts over
                           with the first pattern in the AWK pro-
                           gram.  If the end of the input data is
                           reached, the END block(s), if any, are
                           executed.

     print                 Prints the current record.  The output
                           record is terminated with the value of
                           the ORS variable.

     print expr-list       Prints  expressions.   Each expression
                           is separated by the value of  the  OFS
                           variable.  The output record is termi-
                           nated with the value of the ORS  vari-
                           able.

     print expr-list >file Prints   expressions  on  file.   Each
                           expression is separated by  the  value
                           of   the  OFS  variable.   The  output
                           record is terminated with the value of
                           the ORS variable.

     printf fmt, expr-list Format and print.

     printf fmt, expr-list >file
                           Format and print on file.

     system(cmd-line)      Execute   the  command  cmd-line,  and
                           return the exit status.  (This may not
                           be available on non-POSIX systems.)

     fflush([file])        Flush  any buffers associated with the
                           open output file  or  pipe  file.   If
                           file  is missing, then standard output
                           is  flushed.   If  file  is  the  null
                           string, then all open output files and
                           pipes have their buffers flushed.

     Additional output redirections are  allowed  for  print  and
     printf.

     print ... >> file



Free Software FoundatLast change: Apr 20 2010                  21






Utility Commands                                          GAWK(1)



          Appends output to the file.

     print ... | command
          Writes on a pipe.

     print ... |& command
          Sends  data  to  a co-process or socket.  (See also the
          subsection Special File Names, below.)

     The getline command returns 1 on success, 0 on end of  file,
     and  -1 on an error.  Upon an error, ERRNO contains a string
     describing the problem.

     NOTE: Failure in opening a two-way socket will result  in  a
     non-fatal  error  being returned to the calling function. If
     using a pipe, co-process, or  socket  to  getline,  or  from
     print  or printf within a loop, you must use close() to cre-
     ate new instances of the command or socket.   AWK  does  not
     automatically  close  pipes,  sockets,  or co-processes when
     they return EOF.

  The printf Statement
     The AWK versions of the printf statement and sprintf() func-
     tion  (see below) accept the following conversion specifica-
     tion formats:

     %c      An ASCII character.  If the argument used for %c  is
             numeric,  it  is treated as a character and printed.
             Otherwise, the argument is assumed to be  a  string,
             and  the  only  first  character  of  that string is
             printed.

     %d, %i  A decimal number (the integer part).

     %e, %E  A   floating    point    number    of    the    form
             [-]d.dddddde[+-]dd.  The %E format uses E instead of
             e.

     %f, %F  A floating point number of the  form  [-]ddd.dddddd.
             If  the  system library supports it, %F is available
             as well. This is like %f, but uses  capital  letters
             for special "not a number" and "infinity" values. If
             %F is not available, gawk uses %f.

     %g, %G  Use %e or %f conversion, whichever is shorter,  with
             nonsignificant zeros suppressed.  The %G format uses
             %E instead of %e.

     %o      An unsigned octal number (also an integer).

     %u      An unsigned decimal number (again, an integer).




Free Software FoundatLast change: Apr 20 2010                  22






Utility Commands                                          GAWK(1)



     %s      A character string.

     %x, %X  An unsigned hexadecimal number (an integer).  The %X
             format uses ABCDEF instead of abcdef.

     %%      A single % character; no argument is converted.

     Optional,  additional  parameters  may lie between the % and
     the control letter:

     count$
          Use the count'th argument at this point in the  format-
          ting.   This  is  called  a positional specifier and is
          intended primarily for use in  translated  versions  of
          format strings, not in the original text of an AWK pro-
          gram.  It is a gawk extension.

     -    The expression  should  be  left-justified  within  its
          field.

     space
          For  numeric conversions, prefix positive values with a
          space, and negative values with a minus sign.

     +    The plus sign, used  before  the  width  modifier  (see
          below),  says  to always supply a sign for numeric con-
          versions, even if the data to be formatted is positive.
          The + overrides the space modifier.

     #    Use  an  "alternate  form" for certain control letters.
          For %o, supply a leading zero.  For %x, and %X,  supply
          a  leading  0x or 0X for a nonzero result.  For %e, %E,
          %f and %F, the result always contains a decimal  point.
          For %g, and %G, trailing zeros are not removed from the
          result.

     0    A leading 0 (zero) acts as a flag, that indicates  out-
          put  should  be  padded  with zeroes instead of spaces.
          This applies only to the numeric output formats.   This
          flag  only  has an effect when the field width is wider
          than the value to be printed.

     width
          The field should be padded to this width.  The field is
          normally  padded  with  spaces.  If the 0 flag has been
          used, it is padded with zeroes.

     .prec
          A number that  specifies  the  precision  to  use  when
          printing.   For  the  %e,  %E, %f and %F, formats, this
          specifies the number of digits you want printed to  the
          right  of  the  decimal  point.   For  the  %g,  and %G



Free Software FoundatLast change: Apr 20 2010                  23






Utility Commands                                          GAWK(1)



          formats, it specifies the maximum number of significant
          digits.  For the %d, %o, %i, %u, %x, and %X formats, it
          specifies the minimum number of digits to  print.   For
          %s,  it specifies the maximum number of characters from
          the string that should be printed.

     The dynamic width  and  prec  capabilities  of  the  ANSI  C
     printf() routines are supported.  A * in place of either the
     width or prec specifications causes their values to be taken
     from  the  argument  list  to printf or sprintf().  To use a
     positional specifier with a dynamic width or precision, sup-
     ply  the count$ after the * in the format string.  For exam-
     ple, "%3$*2$.*1$s".

  Special File Names
     When doing I/O redirection from either print or printf  into
     a  file, or via getline from a file, gawk recognizes certain
     special filenames internally.  These filenames allow  access
     to  open  file  descriptors  inherited  from  gawk's  parent
     process (usually the shell).  These file names may  also  be
     used  on the command line to name data files.  The filenames
     are:

     /dev/stdin  The standard input.

     /dev/stdout The standard output.

     /dev/stderr The standard error output.

     /dev/fd/n   The file associated with the open file  descrip-
                 tor n.

     These are particularly useful for error messages.  For exam-
     ple:

          print "You blew it!" > "/dev/stderr"

     whereas you would otherwise have to use

          print "You blew it!" | "cat 1>&2"

     The following special filenames may be used with the |&  co-
     process operator for creating TCP/IP network connections.

     /inet/tcp/lport/rhost/rport  File  for  TCP/IP connection on
                                  local port lport to remote host
                                  rhost  on  remote  port  rport.
                                  Use a port of  0  to  have  the
                                  system pick a port.

     /inet/udp/lport/rhost/rport  Similar, but use UDP/IP instead
                                  of TCP/IP.



Free Software FoundatLast change: Apr 20 2010                  24






Utility Commands                                          GAWK(1)



     /inet/raw/lport/rhost/rport  Reserved for future use.

     Other special filenames provide access to information  about
     the running gawk process.  These filenames are now obsolete.
     Use the PROCINFO array to obtain the information  they  pro-
     vide.  The filenames are:

     /dev/pid    Reading  this file returns the process ID of the
                 current process, in decimal, terminated  with  a
                 newline.

     /dev/ppid   Reading  this file returns the parent process ID
                 of the current process, in  decimal,  terminated
                 with a newline.

     /dev/pgrpid Reading  this  file returns the process group ID
                 of the current process, in  decimal,  terminated
                 with a newline.

     /dev/user   Reading this file returns a single record termi-
                 nated with a newline.  The fields are  separated
                 with  spaces.   $1 is the value of the getuid(2)
                 system call, $2 is the value of  the  geteuid(2)
                 system  call,  $3  is the value of the getgid(2)
                 system call, and $4 is the value  of  the  gete-
                 gid(2) system call.  If there are any additional
                 fields, they are the group IDs returned by  get-
                 groups(2).  Multiple groups may not be supported
                 on all systems.

  Numeric Functions
     AWK has the following built-in arithmetic functions:

     atan2(y, x)   Returns the arctangent of y/x in radians.

     cos(expr)     Returns the cosine of expr, which is in  radi-
                   ans.

     exp(expr)     The exponential function.

     int(expr)     Truncates to integer.

     log(expr)     The natural logarithm function.

     rand()        Returns  a  random  number N, between 0 and 1,
                   such that 0 <= N < 1.

     sin(expr)     Returns the sine of expr, which is in radians.

     sqrt(expr)    The square root function.

     srand([expr]) Uses  expr as a new seed for the random number



Free Software FoundatLast change: Apr 20 2010                  25






Utility Commands                                          GAWK(1)



                   generator.  If no expr is provided,  the  time
                   of  day is used.  The return value is the pre-
                   vious seed for the random number generator.

  String Functions
     Gawk has the following built-in string functions:

     asort(s [, d])          Returns the number  of  elements  in
                             the source array s.  The contents of
                             s are  sorted  using  gawk's  normal
                             rules  for comparing values, and the
                             indices of the sorted  values  of  s
                             are  replaced  with sequential inte-
                             gers  starting  with   1.   If   the
                             optional   destination  array  d  is
                             specified, then s  is  first  dupli-
                             cated  into d, and then d is sorted,
                             leaving the indices  of  the  source
                             array s unchanged.

     asorti(s [, d])         Returns  the  number  of elements in
                             the source array s.  The behavior is
                             the  same as that of asort(), except
                             that the array indices are used  for
                             sorting, not the array values.  When
                             done, the array is  indexed  numeri-
                             cally,  and  the values are those of
                             the original indices.  The  original
                             values are lost; thus provide a sec-
                             ond array if you  wish  to  preserve
                             the original.

     gensub(r, s, h [, t])   Search   the  target  string  t  for
                             matches of the regular expression r.
                             If h is a string beginning with g or
                             G, then replace  all  matches  of  r
                             with  s.   Otherwise,  h is a number
                             indicating  which  match  of  r   to
                             replace.   If  t is not supplied, $0
                             is   used   instead.    Within   the
                             replacement text s, the sequence \n,
                             where n is a digit from 1 to 9,  may
                             be  used  to  indicate just the text
                             that matched the n'th  parenthesized
                             subexpression.  The sequence \0 rep-
                             resents the entire matched text,  as
                             does  the character &.  Unlike sub()
                             and gsub(), the modified  string  is
                             returned  as the result of the func-
                             tion, and the original target string
                             is not changed.




Free Software FoundatLast change: Apr 20 2010                  26






Utility Commands                                          GAWK(1)



     gsub(r, s [, t])        For each substring matching the reg-
                             ular expression r in the  string  t,
                             substitute  the string s, and return
                             the number of substitutions.   If  t
                             is  not  supplied,  use $0.  An & in
                             the  replacement  text  is  replaced
                             with  the  text  that  was  actually
                             matched.  Use \& to get a literal &.
                             (This  must  be  typed as "\\&"; see
                             GAWK: Effective AWK Programming  for
                             a fuller discussion of the rules for
                             &'s and backslashes in the  replace-
                             ment text of sub(), gsub(), and gen-
                             sub().)

     index(s, t)             Returns the index of the string t in
                             the  string  s,  or  0  if  t is not
                             present.  (This implies that charac-
                             ter indices start at one.)

     length([s])             Returns  the length of the string s,
                             or the length of $0 if s is not sup-
                             plied.  Starting with version 3.1.5,
                             as a non-standard extension, with an
                             array argument, length() returns the
                             number of elements in the array.

     match(s, r [, a])       Returns the position in s where  the
                             regular expression r occurs, or 0 if
                             r is not present, and sets the  val-
                             ues  of  RSTART  and  RLENGTH.  Note
                             that the argument order is the  same
                             as for the ~ operator: str ~ re.  If
                             array a is provided,  a  is  cleared
                             and  then  elements  1 through n are
                             filled with the portions of  s  that
                             match  the  corresponding  parenthe-
                             sized subexpression in r.  The  0'th
                             element of a contains the portion of
                             s  matched  by  the  entire  regular
                             expression   r.    Subscripts   a[n,
                             "start"], and a[n, "length"] provide
                             the starting index in the string and
                             length respectively, of each  match-
                             ing substring.

     split(s, a [, r])       Splits the string s into the array a
                             on the  regular  expression  r,  and
                             returns  the number of fields.  If r
                             is omitted, FS is used instead.  The
                             array a is cleared first.  Splitting
                             behaves   identically    to    field



Free Software FoundatLast change: Apr 20 2010                  27






Utility Commands                                          GAWK(1)



                             splitting, described above.

     sprintf(fmt, expr-list) Prints  expr-list  according to fmt,
                             and returns the resulting string.

     strtonum(str)           Examines  str,   and   returns   its
                             numeric value.  If str begins with a
                             leading 0, strtonum()  assumes  that
                             str  is  an  octal  number.   If str
                             begins with a leading 0x or 0X, str-
                             tonum()  assumes that str is a hexa-
                             decimal number.

     sub(r, s [, t])         Just like gsub(), but only the first
                             matching substring is replaced.

     substr(s, i [, n])      Returns the at most n-character sub-
                             string of s starting at i.  If n  is
                             omitted, the rest of s is used.

     tolower(str)            Returns  a  copy  of the string str,
                             with all the  upper-case  characters
                             in  str  translated  to their corre-
                             sponding  lower-case   counterparts.
                             Non-alphabetic  characters  are left
                             unchanged.

     toupper(str)            Returns a copy of  the  string  str,
                             with  all  the lower-case characters
                             in str translated  to  their  corre-
                             sponding   upper-case  counterparts.
                             Non-alphabetic characters  are  left
                             unchanged.

     As  of  version  3.1.5, gawk is multibyte aware.  This means
     that index(), length(), substr() and  match()  all  work  in
     terms of characters, not bytes.

  Time Functions
     Since  one of the primary uses of AWK programs is processing
     log files that contain time stamp information, gawk provides
     the  following  functions for obtaining time stamps and for-
     matting them.

     mktime(datespec)
               Turns datespec into a time stamp of the same  form
               as  returned  by  systime().   The  datespec  is a
               string of the form YYYY MM DD HH MM SS[ DST].  The
               contents  of  the  string are six or seven numbers
               representing respectively the full year  including
               century,  the  month  from 1 to 12, the day of the
               month from 1 to 31, the hour of the day from 0  to



Free Software FoundatLast change: Apr 20 2010                  28






Utility Commands                                          GAWK(1)



               23, the minute from 0 to 59, and the second from 0
               to 60, and an optional daylight saving flag.   The
               values  of  these  numbers  need not be within the
               ranges specified; for example, an hour of -1 means
               1 hour before midnight.  The origin-zero Gregorian
               calendar is assumed, with year 0 preceding year  1
               and year -1 preceding year 0.  The time is assumed
               to be in the local timezone.  If the daylight sav-
               ing  flag  is  positive, the time is assumed to be
               daylight saving time; if zero, the time is assumed
               to   be   standard  time;  and  if  negative  (the
               default), mktime() attempts to  determine  whether
               daylight  saving  time is in effect for the speci-
               fied time.  If datespec does  not  contain  enough
               elements or if the resulting time is out of range,
               mktime() returns -1.

     strftime([format [, timestamp[, utc-flag]]])
               Formats timestamp according to  the  specification
               in format.  If utc-flag is present and is non-zero
               or non-null, the result is in UTC,  otherwise  the
               result  is in local time.  The timestamp should be
               of the same form as  returned  by  systime().   If
               timestamp  is  missing, the current time of day is
               used.  If format  is  missing,  a  default  format
               equivalent  to the output of date(1) is used.  See
               the specification for the strftime()  function  in
               ANSI C for the format conversions that are guaran-
               teed to be available.

     systime() Returns the current time of day as the  number  of
               seconds  since  the Epoch (1970-01-01 00:00:00 UTC
               on POSIX systems).

  Bit Manipulations Functions
     Starting with version 3.1 of gawk, the following bit manipu-
     lation  functions  are  available.   They work by converting
     double-precision floating point values  to  uintmax_t  inte-
     gers,  doing  the  operation, and then converting the result
     back to floating point.  The functions are:

     and(v1, v2)         Return the bitwise  AND  of  the  values
                         provided by v1 and v2.

     compl(val)          Return the bitwise complement of val.

     lshift(val, count)  Return the value of val, shifted left by
                         count bits.

     or(v1, v2)          Return the bitwise OR of the values pro-
                         vided by v1 and v2.




Free Software FoundatLast change: Apr 20 2010                  29






Utility Commands                                          GAWK(1)



     rshift(val, count)  Return  the  value of val, shifted right
                         by count bits.

     xor(v1, v2)         Return the bitwise  XOR  of  the  values
                         provided by v1 and v2.

  Internationalization Functions
     Starting  with  version 3.1 of gawk, the following functions
     may be used from within your  AWK  program  for  translating
     strings  at run-time.  For full details, see GAWK: Effective
     AWK Programming.

     bindtextdomain(directory [, domain])
          Specifies the directory where gawk looks  for  the  .mo
          files, in case they will not or cannot be placed in the
          ``standard''  locations  (e.g.,  during  testing).   It
          returns the directory where domain is ``bound.''
          The  default  domain  is  the  value of TEXTDOMAIN.  If
          directory is the null  string  (""),  then  bindtextdo-
          main()  returns  the  current  binding  for  the  given
          domain.

     dcgettext(string [, domain [, category]])
          Returns the translation of string in text domain domain
          for  locale  category  category.  The default value for
          domain is the current value of TEXTDOMAIN.  The default
          value for category is "LC_MESSAGES".
          If you supply a value for category, it must be a string
          equal to one of the known locale  categories  described
          in GAWK: Effective AWK Programming.  You must also sup-
          ply a text domain.  Use TEXTDOMAIN if you want  to  use
          the current domain.

     dcngettext(string1 , string2 , number [, domain [, cate-
          gory]])
          Returns the plural form used for number of the transla-
          tion  of  string1 and string2 in text domain domain for
          locale category category.  The default value for domain
          is  the current value of TEXTDOMAIN.  The default value
          for category is "LC_MESSAGES".
          If you supply a value for category, it must be a string
          equal  to  one of the known locale categories described
          in GAWK: Effective AWK Programming.  You must also sup-
          ply  a  text domain.  Use TEXTDOMAIN if you want to use
          the current domain.

USER-DEFINED FUNCTIONS
     Functions in AWK are defined as follows:

          function name(parameter list) { statements }





Free Software FoundatLast change: Apr 20 2010                  30






Utility Commands                                          GAWK(1)



     Functions are executed when  they  are  called  from  within
     expressions  in  either patterns or actions.  Actual parame-
     ters supplied in the function call are used  to  instantiate
     the  formal parameters declared in the function.  Arrays are
     passed by reference, other variables are passed by value.

     Since functions were not originally part  of  the  AWK  lan-
     guage,  the  provision for local variables is rather clumsy:
     They are declared as extra parameters in the parameter list.
     The  convention  is  to  separate  local variables from real
     parameters by extra spaces in the parameter list.  For exam-
     ple:

          function  f(p, q,     a, b)   # a and b are local
          {
               ...
          }

          /abc/     { ... ; f(1, 2) ; ... }

     The left parenthesis in a function call is required to imme-
     diately follow the function name,  without  any  intervening
     white  space.   This  avoids  a syntactic ambiguity with the
     concatenation operator.  This restriction does not apply  to
     the built-in functions listed above.

     Functions  may  call each other and may be recursive.  Func-
     tion parameters used as local variables are  initialized  to
     the  null  string  and the number zero upon function invoca-
     tion.

     Use return expr to return a  value  from  a  function.   The
     return value is undefined if no value is provided, or if the
     function returns by "falling off" the end.

     If --lint has been provided, gawk warns about calls to unde-
     fined  functions  at  parse  time,  instead  of at run time.
     Calling an undefined function at run time is a fatal  error.

     The word func may be used in place of function.

DYNAMICALLY LOADING NEW FUNCTIONS
     Beginning  with version 3.1 of gawk, you can dynamically add
     new built-in functions to the running gawk interpreter.  The
     full  details  are beyond the scope of this manual page; see
     GAWK: Effective AWK Programming for the details.

     extension(object, function)
             Dynamically link the shared  object  file  named  by
             object,  and invoke function in that object, to per-
             form initialization.  These should both be  provided
             as strings.  Returns the value returned by function.



Free Software FoundatLast change: Apr 20 2010                  31






Utility Commands                                          GAWK(1)



     This function is provided and documented in GAWK:  Effective
     AWK Programming, but everything about this feature is likely
     to change eventually.  We STRONGLY recommend that you do not
     use  this  feature  for  anything that you aren't willing to
     redo.

SIGNALS
     pgawk accepts two signals.  SIGUSR1 causes it to dump a pro-
     file  and  function call stack to the profile file, which is
     either awkprof.out, or whatever  file  was  named  with  the
     --profile  option.  It then continues to run.  SIGHUP causes
     pgawk to dump the profile and function call stack  and  then
     exit.

EXAMPLES
     Print and sort the login names of all users:

          BEGIN     { FS = ":" }
               { print $1 | "sort" }

     Count lines in a file:

               { nlines++ }
          END  { print nlines }

     Precede each line by its number in the file:

          { print FNR, $0 }

     Concatenate and line number (a variation on a theme):

          { print NR, $0 }
     Run an external command for particular lines of data:

          tail -f access_log |
          awk '/myhome.html/ { system("nmap " $1 ">> logdir/myhome.html") }'

INTERNATIONALIZATION
     String  constants  are  sequences  of characters enclosed in
     double quotes.  In non-English speaking environments, it  is
     possible  to  mark  strings  in the AWK program as requiring
     translation to the native natural language. Such strings are
     marked  in  the AWK program with a leading underscore ("_").
     For example,

          gawk 'BEGIN { print "hello, world" }'

     always prints hello, world.  But,

          gawk 'BEGIN { print _"hello, world" }'

     might print bonjour, monde in France.



Free Software FoundatLast change: Apr 20 2010                  32






Utility Commands                                          GAWK(1)



     There are several steps involved in producing and running  a
     localizable AWK program.

     1.  Add  a  BEGIN action to assign a value to the TEXTDOMAIN
         variable to set the text domain  to  a  name  associated
         with your program.

         BEGIN { TEXTDOMAIN = "myprog" }

     This  allows  gawk to find the .mo file associated with your
     program.  Without this step, gawk  uses  the  messages  text
     domain,  which likely does not contain translations for your
     program.

     2.  Mark all strings that should be translated with  leading
         underscores.

     3.  If  necessary,  use  the  dcgettext() and/or bindtextdo-
         main() functions in your program, as appropriate.

     4.  Run gawk --gen-po -f myprog.awk > myprog.po to  generate
         a .po file for your program.

     5.  Provide  appropriate translations, and build and install
         the corresponding .mo files.

     The internationalization  features  are  described  in  full
     detail in GAWK: Effective AWK Programming.

POSIX COMPATIBILITY
     A  primary  goal  for  gawk  is compatibility with the POSIX
     standard, as well as with the latest version  of  UNIX  awk.
     To  this  end,  gawk incorporates the following user visible
     features which are not described in the AWK  book,  but  are
     part of the Bell Laboratories version of awk, and are in the
     POSIX standard.

     The book indicates that  command  line  variable  assignment
     happens  when  awk  would  otherwise  open the argument as a
     file, which is after the BEGIN block is executed.   However,
     in earlier implementations, when such an assignment appeared
     before any file names, the assignment  would  happen  before
     the  BEGIN  block  was  run.  Applications came to depend on
     this "feature."  When awk was changed to match its  documen-
     tation, the -v option for assigning variables before program
     execution  was  added  to  accommodate   applications   that
     depended  upon  the  old behavior.  (This feature was agreed
     upon by both the Bell Laboratories and the GNU  developers.)

     The  -W  option for implementation specific features is from
     the POSIX standard.




Free Software FoundatLast change: Apr 20 2010                  33






Utility Commands                                          GAWK(1)



     When processing arguments, gawk uses the special option "--"
     to  signal  the end of arguments.  In compatibility mode, it
     warns about but otherwise  ignores  undefined  options.   In
     normal  operation,  such  arguments are passed on to the AWK
     program for it to process.

     The AWK book does not define the return  value  of  srand().
     The  POSIX  standard has it return the seed it was using, to
     allow keeping track of random number  sequences.   Therefore
     srand() in gawk also returns its current seed.

     Other new features are: The use of multiple -f options (from
     MKS awk); the ENVIRON array; the \a, and \v escape sequences
     (done  originally in gawk and fed back into the Bell Labora-
     tories version); the tolower() and toupper() built-in  func-
     tions  (from  the Bell Laboratories version); and the ANSI C
     conversion specifications in printf (done first in the  Bell
     Laboratories version).

HISTORICAL FEATURES
     There  are  two  features  of historical AWK implementations
     that gawk supports.  First,  it  is  possible  to  call  the
     length()  built-in  function  not only with no argument, but
     even without parentheses!  Thus,

          a = length     # Holy Algol 60, Batman!

     is the same as either of

          a = length()
          a = length($0)

     This feature is marked as "deprecated" in  the  POSIX  stan-
     dard,  and  gawk issues a warning about its use if --lint is
     specified on the command line.

     The other feature is the use of either the continue  or  the
     break  statements  outside  the  body of a while, for, or do
     loop.  Traditional AWK  implementations  have  treated  such
     usage  as  equivalent  to the next statement.  Gawk supports
     this usage if --traditional has been specified.

GNU EXTENSIONS
     Gawk has a number of extensions  to  POSIX  awk.   They  are
     described  in  this  section.   All the extensions described
     here can be disabled by invoking gawk with the --traditional
     or --posix options.

     The  following  features  of gawk are not available in POSIX
     awk.

     o No path search is performed for files  named  via  the  -f



Free Software FoundatLast change: Apr 20 2010                  34






Utility Commands                                          GAWK(1)



       option.  Therefore the AWKPATH environment variable is not
       special.

     o The \x escape sequence.  (Disabled with --posix.)

     o The fflush() function.  (Disabled with --posix.)

     o The ability to continue lines after ?  and  :.   (Disabled
       with --posix.)

     o Octal and hexadecimal constants in AWK programs.

     o The  ARGIND, BINMODE, ERRNO, LINT, RT and TEXTDOMAIN vari-
       ables are not special.

     o The IGNORECASE  variable  and  its  side-effects  are  not
       available.

     o The  FIELDWIDTHS variable and fixed-width field splitting.

     o The PROCINFO array is not available.

     o The use of RS as a regular expression.

     o The special file names available for I/O  redirection  are
       not recognized.

     o The |& operator for creating co-processes.

     o The  ability  to split out individual characters using the
       null string as the value of FS, and as the third  argument
       to split().

     o The optional second argument to the close() function.

     o The optional third argument to the match() function.

     o The  ability  to use positional specifiers with printf and
       sprintf().

     o The ability to pass an array to length().

     o The use of delete array to delete the entire  contents  of
       an array.

     o The  use  of nextfile to abandon processing of the current
       input file.

     o The and(), asort(), asorti(),  bindtextdomain(),  compl(),
       dcgettext(),  dcngettext(),  gensub(), lshift(), mktime(),
       or(),  rshift(),  strftime(),  strtonum(),  systime()  and
       xor() functions.



Free Software FoundatLast change: Apr 20 2010                  35






Utility Commands                                          GAWK(1)



     o Localizable strings.

     o Adding  new built-in functions dynamically with the exten-
       sion() function.

     The AWK book does not define the return value of the close()
     function.   Gawk's close() returns the value from fclose(3),
     or pclose(3), when closing an output file or  pipe,  respec-
     tively.   It  returns the process's exit status when closing
     an input pipe.  The return value is -1 if  the  named  file,
     pipe or co-process was not opened with a redirection.

     When  gawk  is invoked with the --traditional option, if the
     fs argument to the -F option is "t", then FS is set  to  the
     tab  character.   Note  that  typing  gawk  -F\t ...  simply
     causes the shell to quote the "t," and does not pass "\t" to
     the -F option.  Since this is a rather ugly special case, it
     is not the default behavior.  This behavior  also  does  not
     occur  if  --posix  has been specified.  To really get a tab
     character as the field separator, it is best to  use  single
     quotes: gawk -F'\t' ....

     If gawk is configured with the --enable-switch option to the
     configure command, then it accepts  an  additional  control-
     flow statement:
          switch (expression) {
          case value|regex : statement
          ...
          [ default: statement ]
          }

     If  gawk  is configured with the --disable-directories-fatal
     option, then it will silently skip directories named on  the
     command line.  Otherwise, it will do so only if invoked with
     the --traditional option.

ENVIRONMENT VARIABLES
     The AWKPATH environment variable can be used  to  provide  a
     list  of  directories  that  gawk  searches when looking for
     files named via the -f and --file options.

     For socket communication, two special environment  variables
     can   be   used   to   control   the   number   of   retries
     (GAWK_SOCK_RETRIES),  and  the  interval   between   retries
     (GAWK_MSEC_SLEEP).  The interval is in milliseconds. On sys-
     tems that do not support usleep(3), the value is rounded  up
     to an integral number of seconds.

     If  POSIXLY_CORRECT  exists  in  the  environment, then gawk
     behaves exactly as if --posix had been specified on the com-
     mand  line.   If  --lint  has  been specified, gawk issues a
     warning message to this effect.



Free Software FoundatLast change: Apr 20 2010                  36






Utility Commands                                          GAWK(1)



EXIT STATUS
     If the exit statement is used with a value, then gawk  exits
     with the numeric value given to it.

     Otherwise,  if there were no problems during execution, gawk
     exits with the value of the C constant  EXIT_SUCCESS.   This
     is usually zero.

     If  an error occurs, gawk exits with the value of the C con-
     stant EXIT_FAILURE.  This is usually one.

     If gawk exits because of a fatal error, the exit  status  is
     2.   On  non-POSIX  systems,  this  value  may  be mapped to
     EXIT_FAILURE.


ATTRIBUTES
     See  attributes(5)  for  descriptions   of   the   following
     attributes:

     +---------------+------------------+
     |ATTRIBUTE TYPE | ATTRIBUTE VALUE  |
     +---------------+------------------+
     |Availability   | text/gawk        |
     +---------------+------------------+
     |Stability      | Volatile         |
     +---------------+------------------+
SEE ALSO
     egrep(1),   getpid(2),  getppid(2),  getpgrp(2),  getuid(2),
     geteuid(2), getgid(2), getegid(2), getgroups(2)

     The AWK  Programming  Language,  Alfred  V.  Aho,  Brian  W.
     Kernighan,  Peter J. Weinberger, Addison-Wesley, 1988.  ISBN
     0-201-07981-X.

     GAWK: Effective AWK Programming, Edition 3.0,  published  by
     the  Free Software Foundation, 2001.  The current version of
     this     document      is      available      online      at
     http://www.gnu.org/software/gawk/manual.

BUGS
     The  -F option is not necessary given the command line vari-
     able assignment feature; it remains only for backwards  com-
     patibility.

     Syntactically  invalid  single  character  programs  tend to
     overflow the parse stack, generating a rather unhelpful mes-
     sage.   Such programs are surprisingly difficult to diagnose
     in the completely general case, and  the  effort  to  do  so
     really is not worth it.





Free Software FoundatLast change: Apr 20 2010                  37






Utility Commands                                          GAWK(1)



AUTHORS
     The  original  version  of  UNIX awk was designed and imple-
     mented by Alfred Aho, Peter Weinberger, and Brian  Kernighan
     of Bell Laboratories.  Brian Kernighan continues to maintain
     and enhance it.

     Paul Rubin and Jay Fenlason, of the  Free  Software  Founda-
     tion, wrote gawk, to be compatible with the original version
     of awk distributed in Seventh Edition UNIX.  John Woods con-
     tributed a number of bug fixes.  David Trueman, with contri-
     butions from Arnold Robbins, made gawk compatible  with  the
     new  version  of  UNIX  awk.   Arnold Robbins is the current
     maintainer.

     The initial DOS port was  done  by  Conrad  Kwok  and  Scott
     Garfinkle.  Scott Deifik is the current DOS maintainer.  Pat
     Rankin did the port to VMS, and Michal  Jaegermann  did  the
     port  to the Atari ST.  The port to OS/2 was done by Kai Uwe
     Rommel, with contributions and help from  Darrel  Hankerson.
     Andreas Buening now maintains the OS/2 port.  Fred Fish sup-
     plied support for the Amiga, and Martin Brown  provided  the
     BeOS  port.   Stephen  Davies  provided  the original Tandem
     port, and Matthew  Woehlke  provided  changes  for  Tandem's
     POSIX-compliant systems.  Ralf Wildenhues now maintains that
     port.

     See the README file in the  gawk  distribution  for  current
     information  about maintainers and which ports are currently
     supported.

VERSION INFORMATION
     This man page documents gawk, version 3.1.8.

BUG REPORTS
     If you find a bug in gawk, please send  electronic  mail  to
     bug-gawk@gnu.org.   Please include your operating system and
     its revision, the version of  gawk  (from  gawk  --version),
     what  C  compiler you used to compile it, and a test program
     and data that are as small as possible for  reproducing  the
     problem.

     Before sending a bug report, please do the following things.
     First, verify that you have  the  latest  version  of  gawk.
     Many  bugs  (usually subtle ones) are fixed at each release,
     and if yours is out of date, the problem  may  already  have
     been  solved.  Second, please see if setting the environment
     variable LC_ALL to LC_ALL=C causes things to behave  as  you
     expect.  If  so,  it's  a  locale  issue, and may or may not
     really be a bug.  Finally, please read this man page and the
     reference manual carefully to be sure that what you think is
     a bug really is, instead of just a quirk in the language.




Free Software FoundatLast change: Apr 20 2010                  38






Utility Commands                                          GAWK(1)



     Whatever you do, do NOT post a bug report in  comp.lang.awk.
     While  the gawk developers occasionally read this newsgroup,
     posting bug reports there is an  unreliable  way  to  report
     bugs.   Instead,  please  use  the electronic mail addresses
     given above.

     If you're using a GNU/Linux system or BSD-based system,  you
     may  wish  to submit a bug report to the vendor of your dis-
     tribution.  That's fine, but please send a copy to the offi-
     cial  email address as well, since there's no guarantee that
     the bug will be forwarded to the gawk maintainer.

ACKNOWLEDGEMENTS
     Brian  Kernighan  of  Bell  Laboratories  provided  valuable
     assistance during testing and debugging.  We thank him.

COPYING PERMISSIONS
     Copyright  (C)  1989,  1991,  1992,  1993, 1994, 1995, 1996,
     1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005, 2007,  2009,
     2010 Free Software Foundation, Inc.

     Permission is granted to make and distribute verbatim copies
     of this manual page provided the copyright notice  and  this
     permission notice are preserved on all copies.

     Permission  is  granted to copy and distribute modified ver-
     sions of this manual page under the conditions for  verbatim
     copying,  provided that the entire resulting derived work is
     distributed under the terms of a permission notice identical
     to this one.

     Permission is granted to copy and distribute translations of
     this manual page into another language, under the above con-
     ditions  for  modified versions, except that this permission
     notice may be stated in a translation approved by the  Foun-
     dation.



NOTES
     This   software   was   built   from   source  available  at
     https://java.net/projects/solaris-userland.   The   original
     community       source       was       downloaded       from
     http://ftp.gnu.org/gnu/gawk/gawk-3.1.8.tar.bz2

     Further information about this software can be found on  the
     open  source  community  website at http://www.gnu.org/soft-
     ware/gawk/.







Free Software FoundatLast change: Apr 20 2010                  39