string - man pages section 3: Extended Library Functions, Volume 1

Language:

string (3erl)

Name

string - String processing functions.

Synopsis

Please see following description for synopsis

Description

string(3)                  Erlang Module Definition                  string(3)



NAME
       string - String processing functions.

DESCRIPTION
       This module provides functions for string processing.

       A  string in this module is represented by unicode:chardata(), that is,
       a list of codepoints, binaries  with  UTF-8-encoded  codepoints  (UTF-8
       binaries), or a mix of the two.

       "abcd"               is a valid string
       <<"abcd">>           is a valid string
       ["abcd"]             is a valid string
       <<"abc.."/utf8>>  is a valid string
       <<"abc..">>       is NOT a valid string,
                            but a binary with Latin-1-encoded codepoints
       [<<"abc">>, ".."] is a valid string
       [atom]               is NOT a valid string

       This  module  operates  on  grapheme  clusters. A grapheme cluster is a
       user-perceived character, which can be  represented  by  several  code-
       points.

       ""  [229] or [97, 778]
       "e"  [101, 778]

       The  string  length  of  "e" is 3, even though it is represented by the
       codepoints     [223,8593,101,778]     or     the      UTF-8      binary
       <<195,159,226,134,145,101,204,138>>.

       Grapheme  clusters  for  codepoints of class prepend and non-modern (or
       decomposed) Hangul is not handled for performance  reasons  in  find/3,
       replace/3, split/2, split/2 and trim/3.

       Splitting and appending strings is to be done on grapheme clusters bor-
       ders. There is no verification that the results  of  appending  strings
       are valid or normalized.

       Most  of  the  functions expect all input to be normalized to one form,
       see for example unicode:characters_to_nfc_list/1.

       Language or locale specific handling of input is not considered in  any
       function.

       The  functions  can crash for non-valid input strings. For example, the
       functions expect UTF-8 binaries but not all functions verify  that  all
       binaries are encoded correctly.

       Unless  otherwise  specified  the  return value type is the same as the
       input type. That is, binary input returns  binary  output,  list  input
       returns a list output, and mixed input can return a mixed output.

       1> string:trim("  sarah  ").
       "sarah"
       2> string:trim(<<"  sarah  ">>).
       <<"sarah">>
       3> string:lexemes("foo bar", " ").
       ["foo","bar"]
       4> string:lexemes(<<"foo bar">>, " ").
       [<<"foo">>,<<"bar">>]

       This  module has been reworked in Erlang/OTP 20 to handle unicode:char-
       data() and operate on grapheme clusters. The old  functions  that  only
       work  on  Latin-1  lists as input are still available but should not be
       used, they will be deprecated in a future release.

DATA TYPES
       direction() = leading | trailing

       grapheme_cluster() = char() | [char()]

              A user-perceived character, consisting  of  one  or  more  code-
              points.

EXPORTS
       casefold(String :: unicode:chardata()) -> unicode:chardata()

              Converts  String  to a case-agnostic comparable string. Function
              casefold/1 is preferred over lowercase/1 when two strings are to
              be compared for equality. See also equal/4.

              Example:

              1> string:casefold(" and  SHARP S").
              " and ss sharp s"

       chomp(String :: unicode:chardata()) -> unicode:chardata()

              Returns a string where any trailing \n or \r\n have been removed
              from String.

              Example:

              182> string:chomp(<<"\nHello\n\n">>).
              <<"\nHello">>
              183> string:chomp("\nHello\r\r\n").
              "\nHello\r"

       equal(A, B) -> boolean()

       equal(A, B, IgnoreCase) -> boolean()

       equal(A, B, IgnoreCase, Norm) -> boolean()

              Types:

                 A = B = unicode:chardata()
                 IgnoreCase = boolean()
                 Norm = none | nfc | nfd | nfkc | nfkd

              Returns true if A and B are equal, otherwise false.

              If IgnoreCase is true the function does casefolding on  the  fly
              before the equality test.

              If  Norm  is  not none the function applies normalization on the
              fly before the equality test. There are four  available  normal-
              ization forms: nfc, nfd, nfkc, and nfkd.

              By default, IgnoreCase is false and Norm is none.

              Example:

              1> string:equal("", <<""/utf8>>).
              true
              2> string:equal("", unicode:characters_to_nfd_binary("")).
              false
              3> string:equal("", unicode:characters_to_nfd_binary(""), true, nfc).
              true

       find(String, SearchPattern) -> unicode:chardata() | nomatch

       find(String, SearchPattern, Dir) -> unicode:chardata() | nomatch

              Types:

                 String = SearchPattern = unicode:chardata()
                 Dir = direction()

              Removes  anything before SearchPattern in String and returns the
              remainder of the string  or  nomatch  if  SearchPattern  is  not
              found.  Dir,  which  can  be leading or trailing, indicates from
              which direction characters are to be searched.

              By default, Dir is leading.

              Example:

              1> string:find("ab..cd..ef", ".").
              "..cd..ef"
              2> string:find(<<"ab..cd..ef">>, "..", trailing).
              <<"..ef">>
              3> string:find(<<"ab..cd..ef">>, "x", leading).
              nomatch
              4> string:find("ab..cd..ef", "x", trailing).
              nomatch

       is_empty(String :: unicode:chardata()) -> boolean()

              Returns true if String is the empty string, otherwise false.

              Example:

              1> string:is_empty("foo").
              false
              2> string:is_empty(["",<<>>]).
              true

       length(String :: unicode:chardata()) -> integer() >= 0

              Returns the number of grapheme clusters in String.

              Example:

              1> string:length("e").
              3
              2> string:length(<<195,159,226,134,145,101,204,138>>).
              3

       lexemes(String :: unicode:chardata(),
               SeparatorList :: [grapheme_cluster()]) ->
                  [unicode:chardata()]

              Returns a list of lexemes in String, separated by  the  grapheme
              clusters in SeparatorList.

              Notice that, as shown in this example, two or more adjacent sep-
              arator graphemes clusters in String are treated as one. That is,
              there are no empty strings in the resulting list of lexemes. See
              also split/3 which returns empty strings.

              Notice that [$\r,$\n] is one grapheme cluster.

              Example:

              1> string:lexemes("abc defxxghix jkl\r\nfoo", "x e" ++ [[$\r,$\n]]).
              ["abc","def","ghi","jkl","foo"]
              2> string:lexemes(<<"abc defxxghix jkl\r\nfoo"/utf8>>, "x e" ++ [$\r,$\n]).
              [<<"abc">>,<<"def"/utf8>>,<<"ghi">>,<<"jkl\r\nfoo">>]

       lowercase(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to lowercase.

              Notice that function casefold/1 should be used when converting a
              string to be tested for equality.

              Example:

              2> string:lowercase(string:uppercase("Micha")).
              "micha"

       next_codepoint(String :: unicode:chardata()) ->
                         maybe_improper_list(char(), unicode:chardata()) |
                         {error, unicode:chardata()}

              Returns  the first codepoint in String and the rest of String in
              the tail. Returns an empty list if String is empty or an {error,
              String} tuple if the next byte is invalid.

              Example:

              1> string:next_codepoint(unicode:characters_to_binary("efg")).
              [101|<<"fg"/utf8>>]

       next_grapheme(String :: unicode:chardata()) ->
                        maybe_improper_list(grapheme_cluster(),
                                            unicode:chardata()) |
                        {error, unicode:chardata()}

              Returns  the  first  grapheme  cluster in String and the rest of
              String in the tail. Returns an empty list if String is empty  or
              an {error, String} tuple if the next byte is invalid.

              Example:

              1> string:next_grapheme(unicode:characters_to_binary("efg")).
              ["e"|<<"fg">>]

       nth_lexeme(String, N, SeparatorList) -> unicode:chardata()

              Types:

                 String = unicode:chardata()
                 N = integer() >= 0
                 SeparatorList = [grapheme_cluster()]

              Returns  lexeme  number N in String, where lexemes are separated
              by the grapheme clusters in SeparatorList.

              Example:

              1> string:nth_lexeme("abc.def.ghiejkl", 3, ".e").
              "ghi"

       pad(String, Length) -> unicode:charlist()

       pad(String, Length, Dir) -> unicode:charlist()

       pad(String, Length, Dir, Char) -> unicode:charlist()

              Types:

                 String = unicode:chardata()
                 Length = integer()
                 Dir = direction() | both
                 Char = grapheme_cluster()

              Pads String to Length with grapheme cluster Char. Dir, which can
              be  leading,  trailing,  or  both,  indicates  where the padding
              should be added.

              By default, Char is $\s and Dir is trailing.

              Example:

              1> string:pad(<<"Hell"/utf8>>, 8).
              [<<72,101,204,138,108,108,195,182>>,32,32,32]
              2> io:format("'~ts'~n",[string:pad("Hell", 8, leading)]).
              3> io:format("'~ts'~n",[string:pad("Hell", 8, both)]).

       prefix(String :: unicode:chardata(), Prefix :: unicode:chardata()) ->
                 nomatch | unicode:chardata()

              If Prefix is the prefix of String, removes it  and  returns  the
              remainder of String, otherwise returns nomatch.

              Example:

              1> string:prefix(<<"prefix of string">>, "pre").
              <<"fix of string">>
              2> string:prefix("pre", "prefix").
              nomatch

       replace(String, SearchPattern, Replacement) ->
                  [unicode:chardata()]

       replace(String, SearchPattern, Replacement, Where) ->
                  [unicode:chardata()]

              Types:

                 String = SearchPattern = Replacement = unicode:chardata()
                 Where = direction() | all

              Replaces   SearchPattern  in  String  with  Replacement.  Where,
              default leading, indicates whether the leading, the trailing  or
              all encounters of SearchPattern are to be replaced.

              Can be implemented as:

              lists:join(Replacement, split(String, SearchPattern, Where)).

              Example:

              1> string:replace(<<"ab..cd..ef">>, "..", "*").
              [<<"ab">>,"*",<<"cd..ef">>]
              2> string:replace(<<"ab..cd..ef">>, "..", "*", all).
              [<<"ab">>,"*",<<"cd">>,"*",<<"ef">>]

       reverse(String :: unicode:chardata()) -> [grapheme_cluster()]

              Returns the reverse list of the grapheme clusters in String.

              Example:

              1> Reverse = string:reverse(unicode:characters_to_nfd_binary("")).
              [[79,776],[65,776],[65,778]]
              2> io:format("~ts~n",[Reverse]).
              OAA

       slice(String, Start) -> Slice

       slice(String, Start, Length) -> Slice

              Types:

                 String = unicode:chardata()
                 Start = integer() >= 0
                 Length = infinity | integer() >= 0
                 Slice = unicode:chardata()

              Returns  a  substring of String of at most Length grapheme clus-
              ters, starting at position Start.

              By default, Length is infinity.

              Example:

              1> string:slice(<<"Hell Wrld"/utf8>>, 4).
              <<" Wrld"/utf8>>
              2> string:slice(["Hell ", <<"Wrld"/utf8>>], 4,4).
              " W"
              3> string:slice(["Hell ", <<"Wrld"/utf8>>], 4,50).
              " Wrld"

       split(String, SearchPattern) -> [unicode:chardata()]

       split(String, SearchPattern, Where) -> [unicode:chardata()]

              Types:

                 String = SearchPattern = unicode:chardata()
                 Where = direction() | all

              Splits String where SearchPattern is encountered and return  the
              remaining  parts.  Where, default leading, indicates whether the
              leading, the trailing or all encounters  of  SearchPattern  will
              split String.

              Example:

              0> string:split("ab..bc..cd", "..").
              ["ab","bc..cd"]
              1> string:split(<<"ab..bc..cd">>, "..", trailing).
              [<<"ab..bc">>,<<"cd">>]
              2> string:split(<<"ab..bc....cd">>, "..", all).
              [<<"ab">>,<<"bc">>,<<>>,<<"cd">>]

       take(String, Characters) -> {Leading, Trailing}

       take(String, Characters, Complement) -> {Leading, Trailing}

       take(String, Characters, Complement, Dir) -> {Leading, Trailing}

              Types:

                 String = unicode:chardata()
                 Characters = [grapheme_cluster()]
                 Complement = boolean()
                 Dir = direction()
                 Leading = Trailing = unicode:chardata()

              Takes  characters from String as long as the characters are mem-
              bers of set Characters or the complement of set Characters. Dir,
              which can be leading or trailing, indicates from which direction
              characters are to be taken.

              Example:

              5> string:take("abc0z123", lists:seq($a,$z)).
              {"abc","0z123"}
              6> string:take(<<"abc0z123">>, lists:seq($0,$9), true, leading).
              {<<"abc">>,<<"0z123">>}
              7> string:take("abc0z123", lists:seq($0,$9), false, trailing).
              {"abc0z","123"}
              8> string:take(<<"abc0z123">>, lists:seq($a,$z), true, trailing).
              {<<"abc0z">>,<<"123">>}

       titlecase(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to titlecase.

              Example:

              1> string:titlecase(" is a SHARP s").
              "Ss is a SHARP s"

       to_float(String) -> {Float, Rest} | {error, Reason}

              Types:

                 String = unicode:chardata()
                 Float = float()
                 Rest = unicode:chardata()
                 Reason = no_float | badarg

              Argument String is expected to start with a  valid  text  repre-
              sented float (the digits are ASCII values). Remaining characters
              in the string after the float are returned in Rest.

              Example:

              > {F1,Fs} = string:to_float("1.0-1.0e-1"),
              > {F2,[]} = string:to_float(Fs),
              > F1+F2.
              0.9
              > string:to_float("3/2=1.5").
              {error,no_float}
              > string:to_float("-1.5eX").
              {-1.5,"eX"}

       to_integer(String) -> {Int, Rest} | {error, Reason}

              Types:

                 String = unicode:chardata()
                 Int = integer()
                 Rest = unicode:chardata()
                 Reason = no_integer | badarg

              Argument String is expected to start with a  valid  text  repre-
              sented  integer (the digits are ASCII values). Remaining charac-
              ters in the string after the integer are returned in Rest.

              Example:

              > {I1,Is} = string:to_integer("33+22"),
              > {I2,[]} = string:to_integer(Is),
              > I1-I2.
              11
              > string:to_integer("0.5").
              {0,".5"}
              > string:to_integer("x=2").
              {error,no_integer}

       to_graphemes(String :: unicode:chardata()) -> [grapheme_cluster()]

              Converts String to a list of grapheme clusters.

              Example:

              1> string:to_graphemes("e").
              [223,8593,[101,778]]
              2> string:to_graphemes(<<"e"/utf8>>).
              [223,8593,[101,778]]

       trim(String) -> unicode:chardata()

       trim(String, Dir) -> unicode:chardata()

       trim(String, Dir, Characters) -> unicode:chardata()

              Types:

                 String = unicode:chardata()
                 Dir = direction() | both
                 Characters = [grapheme_cluster()]

              Returns a string, where leading or trailing, or both, Characters
              have  been removed. Dir which can be leading, trailing, or both,
              indicates from which direction characters are to be removed.

              Default Characters is the set of nonbreakable  whitespace  code-
              points, defined as Pattern_White_Space in Unicode Standard Annex
              #31. By default, Dir is both.

              Notice that [$\r,$\n] is one grapheme cluster according  to  the
              Unicode Standard.

              Example:

              1> string:trim("\t Hello \n").
              "Hello"
              2> string:trim(<<"\t Hello \n">>, leading).
              <<"Hello  \n">>
              3> string:trim(<<".Hello.\n">>, trailing, "\n.").
              <<".Hello">>

       uppercase(String :: unicode:chardata()) -> unicode:chardata()

              Converts String to uppercase.

              See also titlecase/1.

              Example:

              1> string:uppercase("Micha").
              "MICHA"

OBSOLETE API FUNCTIONS
       Here  follows the function of the old API. These functions only work on
       a list of Latin-1 characters.

   Note:
       The functions are kept for backward compatibility, but are  not  recom-
       mended. They will be deprecated in a future release.

       Any undocumented functions in string are not to be used.


EXPORTS
       centre(String, Number) -> Centered

       centre(String, Number, Character) -> Centered

              Types:

                 String = Centered = string()
                 Number = integer() >= 0
                 Character = char()

              Returns  a  string,  where  String is centered in the string and
              surrounded by blanks or  Character.  The  resulting  string  has
              length Number.

              This function is obsolete. Use pad/3.

       chars(Character, Number) -> String

       chars(Character, Number, Tail) -> String

              Types:

                 Character = char()
                 Number = integer() >= 0
                 Tail = String = string()

              Returns  a  string  consisting  of  Number characters Character.
              Optionally, the string can end with string Tail.

              This function is obsolete. Use lists:duplicate/2.

       chr(String, Character) -> Index

              Types:

                 String = string()
                 Character = char()
                 Index = integer() >= 0

              Returns the index  of  the  first  occurrence  of  Character  in
              String. Returns 0 if Character does not occur.

              This function is obsolete. Use find/2.

       concat(String1, String2) -> String3

              Types:

                 String1 = String2 = String3 = string()

              Concatenates  String1  and String2 to form a new string String3,
              which is returned.

              This function is obsolete. Use [String1, String2] as Data  argu-
              ment,  and  call unicode:characters_to_list/2 or unicode:charac-
              ters_to_binary/2 to flatten the output.

       copies(String, Number) -> Copies

              Types:

                 String = Copies = string()
                 Number = integer() >= 0

              Returns a string containing String repeated Number times.

              This function is obsolete. Use lists:duplicate/2.

       cspan(String, Chars) -> Length

              Types:

                 String = Chars = string()
                 Length = integer() >= 0

              Returns the length of the maximum  initial  segment  of  String,
              which consists entirely of characters not from Chars.

              This function is obsolete. Use take/3.

              Example:

              > string:cspan("\t    abcdef", " \t").
              0

       join(StringList, Separator) -> String

              Types:

                 StringList = [string()]
                 Separator = String = string()

              Returns  a  string  with the elements of StringList separated by
              the string in Separator.

              This function is obsolete. Use lists:join/2.

              Example:

              > join(["one", "two", "three"], ", ").
              "one, two, three"

       left(String, Number) -> Left

       left(String, Number, Character) -> Left

              Types:

                 String = Left = string()
                 Number = integer() >= 0
                 Character = char()

              Returns String with the length adjusted in accordance with  Num-
              ber.  The left margin is fixed. If length(String) < Number, then
              String is padded with blanks or Characters.

              This function is obsolete. Use pad/2 or pad/3.

              Example:

              > string:left("Hello",10,$.).
              "Hello....."

       len(String) -> Length

              Types:

                 String = string()
                 Length = integer() >= 0

              Returns the number of characters in String.

              This function is obsolete. Use length/1.

       rchr(String, Character) -> Index

              Types:

                 String = string()
                 Character = char()
                 Index = integer() >= 0

              Returns the index of the last occurrence of Character in String.
              Returns 0 if Character does not occur.

              This function is obsolete. Use find/3.

       right(String, Number) -> Right

       right(String, Number, Character) -> Right

              Types:

                 String = Right = string()
                 Number = integer() >= 0
                 Character = char()

              Returns  String with the length adjusted in accordance with Num-
              ber. The right margin is fixed. If the length of (String) < Num-
              ber, then String is padded with blanks or Characters.

              This function is obsolete. Use pad/3.

              Example:

              > string:right("Hello", 10, $.).
              ".....Hello"

       rstr(String, SubString) -> Index

              Types:

                 String = SubString = string()
                 Index = integer() >= 0

              Returns  the  position  where  the  last occurrence of SubString
              begins in String. Returns 0  if  SubString  does  not  exist  in
              String.

              This function is obsolete. Use find/3.

              Example:

              > string:rstr(" Hello Hello World World ", "Hello World").
              8

       span(String, Chars) -> Length

              Types:

                 String = Chars = string()
                 Length = integer() >= 0

              Returns  the  length  of  the maximum initial segment of String,
              which consists entirely of characters from Chars.

              This function is obsolete. Use take/2.

              Example:

              > string:span("\t    abcdef", " \t").
              5

       str(String, SubString) -> Index

              Types:

                 String = SubString = string()
                 Index = integer() >= 0

              Returns the position where the  first  occurrence  of  SubString
              begins  in  String.  Returns  0  if  SubString does not exist in
              String.

              This function is obsolete. Use find/2.

              Example:

              > string:str(" Hello Hello World World ", "Hello World").
              8

       strip(String :: string()) -> string()

       strip(String, Direction) -> Stripped

       strip(String, Direction, Character) -> Stripped

              Types:

                 String = Stripped = string()
                 Direction = left | right | both
                 Character = char()

              Returns a string, where leading or trailing, or both, blanks  or
              a number of Character have been removed. Direction, which can be
              left, right, or both, indicates from which direction blanks  are
              to be removed. strip/1 is equivalent to strip(String, both).

              This function is obsolete. Use trim/3.

              Example:

              > string:strip("...Hello.....", both, $.).
              "Hello"

       sub_string(String, Start) -> SubString

       sub_string(String, Start, Stop) -> SubString

              Types:

                 String = SubString = string()
                 Start = Stop = integer() >= 1

              Returns a substring of String, starting at position Start to the
              end of the string, or to and including position Stop.

              This function is obsolete. Use slice/3.

              Example:

              sub_string("Hello World", 4, 8).
              "lo Wo"

       substr(String, Start) -> SubString

       substr(String, Start, Length) -> SubString

              Types:

                 String = SubString = string()
                 Start = integer() >= 1
                 Length = integer() >= 0

              Returns a substring of String, starting at position  Start,  and
              ending at the end of the string or at length Length.

              This function is obsolete. Use slice/3.

              Example:

              > substr("Hello World", 4, 5).
              "lo Wo"

       sub_word(String, Number) -> Word

       sub_word(String, Number, Character) -> Word

              Types:

                 String = Word = string()
                 Number = integer()
                 Character = char()

              Returns  the  word in position Number of String. Words are sepa-
              rated by blanks or Characters.

              This function is obsolete. Use nth_lexeme/3.

              Example:

              > string:sub_word(" Hello old boy !",3,$o).
              "ld b"

       to_lower(String) -> Result

       to_lower(Char) -> CharResult

       to_upper(String) -> Result

       to_upper(Char) -> CharResult

              Types:

                 String = Result = io_lib:latin1_string()
                 Char = CharResult = char()

              The specified string or character is case-converted. Notice that
              the supported character set is ISO/IEC 8859-1 (also called Latin
              1); all values outside this set are unchanged

              This function is obsolete use lowercase/1,  uppercase/1,  title-
              case/1 or casefold/1.

       tokens(String, SeparatorList) -> Tokens

              Types:

                 String = SeparatorList = string()
                 Tokens = [Token :: nonempty_string()]

              Returns  a list of tokens in String, separated by the characters
              in SeparatorList.

              Example:

              > tokens("abc defxxghix jkl", "x ").
              ["abc", "def", "ghi", "jkl"]

              Notice that, as shown in this example, two or more adjacent sep-
              arator  characters  in String are treated as one. That is, there
              are no empty strings in the resulting list of tokens.

              This function is obsolete. Use lexemes/2.

       words(String) -> Count

       words(String, Character) -> Count

              Types:

                 String = string()
                 Character = char()
                 Count = integer() >= 1

              Returns the number of words in String, separated  by  blanks  or
              Character.

              This function is obsolete. Use lexemes/2.

              Example:

              > words(" Hello old boy!", $o).
              4

NOTES
       Some  of  the  general string functions can seem to overlap each other.
       The reason is that this string package is the combination of  two  ear-
       lier packages and all functions of both packages have been retained.



Ericsson AB                       stdlib 3.17                        string(3)