Go to main content

man pages section 3: Extended Library Functions, Volume 1

Exit Print View

Updated: Wednesday, July 27, 2022
 
 

erl_scan (3erl)

Name

erl_scan - The Erlang token scanner.

Synopsis

Please see following description for synopsis

Description

erl_scan(3)                Erlang Module Definition                erl_scan(3)



NAME
       erl_scan - The Erlang token scanner.

DESCRIPTION
       This  module  contains  functions  for tokenizing (scanning) characters
       into Erlang tokens.

DATA TYPES
       category() = atom()

       error_description() = term()

       error_info() =
           {erl_anno:location(), module(), error_description()}

       option() =
           return | return_white_spaces | return_comments | text |
           {reserved_word_fun, resword_fun()}

       options() = option() | [option()]

       symbol() = atom() | float() | integer() | string()

       resword_fun() = fun((atom()) -> boolean())

       token() =
           {category(), Anno :: erl_anno:anno(), symbol()} |
           {category(), Anno :: erl_anno:anno()}

       tokens() = [token()]

       tokens_result() =
           {ok, Tokens :: tokens(), EndLocation :: erl_anno:location()} |
           {eof, EndLocation :: erl_anno:location()} |
           {error,
            ErrorInfo :: error_info(),
            EndLocation :: erl_anno:location()}

EXPORTS
       category(Token) -> category()

              Types:

                 Token = token()

              Returns the category of Token.

       column(Token) -> erl_anno:column() | undefined

              Types:

                 Token = token()

              Returns the column of Token's collection of annotations.

       end_location(Token) -> erl_anno:location() | undefined

              Types:

                 Token = token()

              Returns the end location of the text of  Token's  collection  of
              annotations. If there is no text, undefined is returned.

       format_error(ErrorDescriptor) -> string()

              Types:

                 ErrorDescriptor = error_description()

              Uses  an ErrorDescriptor and returns a string that describes the
              error or warning. This function  is  usually  called  implicitly
              when  an  ErrorInfo  structure  is  processed (see section Error
              Information).

       line(Token) -> erl_anno:line()

              Types:

                 Token = token()

              Returns the line of Token's collection of annotations.

       location(Token) -> erl_anno:location()

              Types:

                 Token = token()

              Returns the location of Token's collection of annotations.

       reserved_word(Atom :: atom()) -> boolean()

              Returns true if Atom  is  an  Erlang  reserved  word,  otherwise
              false.

       string(String) -> Return

       string(String, StartLocation) -> Return

       string(String, StartLocation, Options) -> Return

              Types:

                 String = string()
                 Options = options()
                 Return =
                     {ok, Tokens :: tokens(), EndLocation} |
                     {error, ErrorInfo :: error_info(), ErrorLocation}
                 StartLocation  = EndLocation = ErrorLocation = erl_anno:loca-
                 tion()

              Takes the list of characters String and tries to scan (tokenize)
              them. Returns one of the following:

                {ok, Tokens, EndLocation}:
                  Tokens are the Erlang tokens from String. EndLocation is the
                  first location after the last token.

                {error, ErrorInfo, ErrorLocation}:
                  An error occurred. ErrorLocation is the first location after
                  the erroneous token.

              string(String)   is   equivalent   to   string(String,  1),  and
              string(String, StartLocation) is  equivalent  to  string(String,
              StartLocation, []).

              StartLocation  indicates  the  initial  location  when  scanning
              starts. If StartLocation  is  a  line,  Anno,  EndLocation,  and
              ErrorLocation  are  lines.  If StartLocation is a pair of a line
              and a column, Anno takes the form of  an  opaque  compound  data
              type,  and EndLocation and ErrorLocation are pairs of a line and
              a column. The token annotations contain  information  about  the
              column  and the line where the token begins, as well as the text
              of the token (if option text is specified), all of which can  be
              accessed by calling column/1, line/1, location/1, and text/1.

              A  token is a tuple containing information about syntactic cate-
              gory, the token annotations, and the terminal symbol. For  punc-
              tuation  characters  (such  as  ; and |) and reserved words, the
              category and the symbol coincide, and the token  is  represented
              by a two-tuple. Three-tuples have one of the following forms:

                * {atom, Anno, atom()}

                * {char, Anno, char()}

                * {comment, Anno, string()}

                * {float, Anno, float()}

                * {integer, Anno, integer()}

                * {var, Anno, atom()}

                * {white_space, Anno, string()}

              Valid options:

                {reserved_word_fun, reserved_word_fun()}:
                  A  callback  function  that  is  called when the scanner has
                  found an unquoted atom. If the function  returns  true,  the
                  unquoted  atom  itself becomes the category of the token. If
                  the function returns false, atom becomes the category of the
                  unquoted atom.

                return_comments:
                  Return comment tokens.

                return_white_spaces:
                  Return  white space tokens. By convention, a newline charac-
                  ter, if present, is always the first character of  the  text
                  (there  cannot  be  more  than  one newline in a white space
                  token).

                return:
                  Short for [return_comments, return_white_spaces].

                text:
                  Include the token text in the token annotation. The text  is
                  the part of the input corresponding to the token.

       symbol(Token) -> symbol()

              Types:

                 Token = token()

              Returns the symbol of Token.

       text(Token) -> erl_anno:text() | undefined

              Types:

                 Token = token()

              Returns  the text of Token's collection of annotations. If there
              is no text, undefined is returned.

       tokens(Continuation, CharSpec, StartLocation) -> Return

       tokens(Continuation, CharSpec, StartLocation, Options) -> Return

              Types:

                 Continuation = return_cont() | []
                 CharSpec = char_spec()
                 StartLocation = erl_anno:location()
                 Options = options()
                 Return =
                     {done,
                      Result :: tokens_result(),
                      LeftOverChars :: char_spec()} |
                     {more, Continuation1 :: return_cont()}
                 char_spec() = string() | eof
                 return_cont()
                   An opaque continuation.

              This is the re-entrant scanner,  which  scans  characters  until
              either  a dot ('.' followed by a white space) or eof is reached.
              It returns:

                {done, Result, LeftOverChars}:
                  Indicates that there is  sufficient  input  data  to  get  a
                  result. Result is:

                  {ok, Tokens, EndLocation}:
                    The  scanning was successful. Tokens is the list of tokens
                    including dot.

                  {eof, EndLocation}:
                    End of file was encountered before any more tokens.

                  {error, ErrorInfo, EndLocation}:
                    An error occurred. LeftOverChars is the remaining  charac-
                    ters of the input data, starting from EndLocation.

                {more, Continuation1}:
                  More  data  is  required  for building a term. Continuation1
                  must be passed in a new call to tokens/3,4 when more data is
                  available.

              The  CharSpec  eof signals end of file. LeftOverChars then takes
              the value eof as well.

              tokens(Continuation, CharSpec, StartLocation) is  equivalent  to
              tokens(Continuation, CharSpec, StartLocation, []).

              For a description of the options, see string/3.

ERROR INFORMATION
       ErrorInfo is the standard ErrorInfo structure that is returned from all
       I/O modules. The format is as follows:

       {ErrorLocation, Module, ErrorDescriptor}

       A string describing the error is obtained with the following call:

       Module:format_error(ErrorDescriptor)

NOTES
       The continuation of the first call to the  re-entrant  input  functions
       must  be  [].  For  a  complete description of how the re-entrant input
       scheme works, see Armstrong, Virding and Williams: 'Concurrent Program-
       ming in Erlang', Chapter 13.

SEE ALSO
       erl_anno(3), erl_parse(3), io(3)



Ericsson AB                       stdlib 3.17                      erl_scan(3)