string - String processing functions.
Please see following description for synopsis
string(3) Erlang Module Definition string(3)
NAME
string - String processing functions.
DESCRIPTION
This module provides functions for string processing.
A string in this module is represented by unicode:chardata(), that is,
a list of codepoints, binaries with UTF-8-encoded codepoints (UTF-8
binaries), or a mix of the two.
"abcd" is a valid string
<<"abcd">> is a valid string
["abcd"] is a valid string
<<"abc.."/utf8>> is a valid string
<<"abc..">> is NOT a valid string,
but a binary with Latin-1-encoded codepoints
[<<"abc">>, ".."] is a valid string
[atom] is NOT a valid string
This module operates on grapheme clusters. A grapheme cluster is a
user-perceived character, which can be represented by several code-
points.
"" [229] or [97, 778]
"e" [101, 778]
The string length of "e" is 3, even though it is represented by the
codepoints [223,8593,101,778] or the UTF-8 binary
<<195,159,226,134,145,101,204,138>>.
Grapheme clusters for codepoints of class prepend and non-modern (or
decomposed) Hangul is not handled for performance reasons in find/3,
replace/3, split/2, split/2 and trim/3.
Splitting and appending strings is to be done on grapheme clusters bor-
ders. There is no verification that the results of appending strings
are valid or normalized.
Most of the functions expect all input to be normalized to one form,
see for example unicode:characters_to_nfc_list/1.
Language or locale specific handling of input is not considered in any
function.
The functions can crash for non-valid input strings. For example, the
functions expect UTF-8 binaries but not all functions verify that all
binaries are encoded correctly.
Unless otherwise specified the return value type is the same as the
input type. That is, binary input returns binary output, list input
returns a list output, and mixed input can return a mixed output.
1> string:trim(" sarah ").
"sarah"
2> string:trim(<<" sarah ">>).
<<"sarah">>
3> string:lexemes("foo bar", " ").
["foo","bar"]
4> string:lexemes(<<"foo bar">>, " ").
[<<"foo">>,<<"bar">>]
This module has been reworked in Erlang/OTP 20 to handle unicode:char-
data() and operate on grapheme clusters. The old functions that only
work on Latin-1 lists as input are still available but should not be
used, they will be deprecated in a future release.
DATA TYPES
direction() = leading | trailing
grapheme_cluster() = char() | [char()]
A user-perceived character, consisting of one or more code-
points.
EXPORTS
casefold(String :: unicode:chardata()) -> unicode:chardata()
Converts String to a case-agnostic comparable string. Function
casefold/1 is preferred over lowercase/1 when two strings are to
be compared for equality. See also equal/4.
Example:
1> string:casefold(" and SHARP S").
" and ss sharp s"
chomp(String :: unicode:chardata()) -> unicode:chardata()
Returns a string where any trailing \n or \r\n have been removed
from String.
Example:
182> string:chomp(<<"\nHello\n\n">>).
<<"\nHello">>
183> string:chomp("\nHello\r\r\n").
"\nHello\r"
equal(A, B) -> boolean()
equal(A, B, IgnoreCase) -> boolean()
equal(A, B, IgnoreCase, Norm) -> boolean()
Types:
A = B = unicode:chardata()
IgnoreCase = boolean()
Norm = none | nfc | nfd | nfkc | nfkd
Returns true if A and B are equal, otherwise false.
If IgnoreCase is true the function does casefolding on the fly
before the equality test.
If Norm is not none the function applies normalization on the
fly before the equality test. There are four available normal-
ization forms: nfc, nfd, nfkc, and nfkd.
By default, IgnoreCase is false and Norm is none.
Example:
1> string:equal("", <<""/utf8>>).
true
2> string:equal("", unicode:characters_to_nfd_binary("")).
false
3> string:equal("", unicode:characters_to_nfd_binary(""), true, nfc).
true
find(String, SearchPattern) -> unicode:chardata() | nomatch
find(String, SearchPattern, Dir) -> unicode:chardata() | nomatch
Types:
String = SearchPattern = unicode:chardata()
Dir = direction()
Removes anything before SearchPattern in String and returns the
remainder of the string or nomatch if SearchPattern is not
found. Dir, which can be leading or trailing, indicates from
which direction characters are to be searched.
By default, Dir is leading.
Example:
1> string:find("ab..cd..ef", ".").
"..cd..ef"
2> string:find(<<"ab..cd..ef">>, "..", trailing).
<<"..ef">>
3> string:find(<<"ab..cd..ef">>, "x", leading).
nomatch
4> string:find("ab..cd..ef", "x", trailing).
nomatch
is_empty(String :: unicode:chardata()) -> boolean()
Returns true if String is the empty string, otherwise false.
Example:
1> string:is_empty("foo").
false
2> string:is_empty(["",<<>>]).
true
length(String :: unicode:chardata()) -> integer() >= 0
Returns the number of grapheme clusters in String.
Example:
1> string:length("e").
3
2> string:length(<<195,159,226,134,145,101,204,138>>).
3
lexemes(String :: unicode:chardata(),
SeparatorList :: [grapheme_cluster()]) ->
[unicode:chardata()]
Returns a list of lexemes in String, separated by the grapheme
clusters in SeparatorList.
Notice that, as shown in this example, two or more adjacent sep-
arator graphemes clusters in String are treated as one. That is,
there are no empty strings in the resulting list of lexemes. See
also split/3 which returns empty strings.
Notice that [$\r,$\n] is one grapheme cluster.
Example:
1> string:lexemes("abc defxxghix jkl\r\nfoo", "x e" ++ [[$\r,$\n]]).
["abc","def","ghi","jkl","foo"]
2> string:lexemes(<<"abc defxxghix jkl\r\nfoo"/utf8>>, "x e" ++ [$\r,$\n]).
[<<"abc">>,<<"def"/utf8>>,<<"ghi">>,<<"jkl\r\nfoo">>]
lowercase(String :: unicode:chardata()) -> unicode:chardata()
Converts String to lowercase.
Notice that function casefold/1 should be used when converting a
string to be tested for equality.
Example:
2> string:lowercase(string:uppercase("Micha")).
"micha"
next_codepoint(String :: unicode:chardata()) ->
maybe_improper_list(char(), unicode:chardata()) |
{error, unicode:chardata()}
Returns the first codepoint in String and the rest of String in
the tail. Returns an empty list if String is empty or an {error,
String} tuple if the next byte is invalid.
Example:
1> string:next_codepoint(unicode:characters_to_binary("efg")).
[101|<<"fg"/utf8>>]
next_grapheme(String :: unicode:chardata()) ->
maybe_improper_list(grapheme_cluster(),
unicode:chardata()) |
{error, unicode:chardata()}
Returns the first grapheme cluster in String and the rest of
String in the tail. Returns an empty list if String is empty or
an {error, String} tuple if the next byte is invalid.
Example:
1> string:next_grapheme(unicode:characters_to_binary("efg")).
["e"|<<"fg">>]
nth_lexeme(String, N, SeparatorList) -> unicode:chardata()
Types:
String = unicode:chardata()
N = integer() >= 0
SeparatorList = [grapheme_cluster()]
Returns lexeme number N in String, where lexemes are separated
by the grapheme clusters in SeparatorList.
Example:
1> string:nth_lexeme("abc.def.ghiejkl", 3, ".e").
"ghi"
pad(String, Length) -> unicode:charlist()
pad(String, Length, Dir) -> unicode:charlist()
pad(String, Length, Dir, Char) -> unicode:charlist()
Types:
String = unicode:chardata()
Length = integer()
Dir = direction() | both
Char = grapheme_cluster()
Pads String to Length with grapheme cluster Char. Dir, which can
be leading, trailing, or both, indicates where the padding
should be added.
By default, Char is $\s and Dir is trailing.
Example:
1> string:pad(<<"Hell"/utf8>>, 8).
[<<72,101,204,138,108,108,195,182>>,32,32,32]
2> io:format("'~ts'~n",[string:pad("Hell", 8, leading)]).
3> io:format("'~ts'~n",[string:pad("Hell", 8, both)]).
prefix(String :: unicode:chardata(), Prefix :: unicode:chardata()) ->
nomatch | unicode:chardata()
If Prefix is the prefix of String, removes it and returns the
remainder of String, otherwise returns nomatch.
Example:
1> string:prefix(<<"prefix of string">>, "pre").
<<"fix of string">>
2> string:prefix("pre", "prefix").
nomatch
replace(String, SearchPattern, Replacement) ->
[unicode:chardata()]
replace(String, SearchPattern, Replacement, Where) ->
[unicode:chardata()]
Types:
String = SearchPattern = Replacement = unicode:chardata()
Where = direction() | all
Replaces SearchPattern in String with Replacement. Where,
default leading, indicates whether the leading, the trailing or
all encounters of SearchPattern are to be replaced.
Can be implemented as:
lists:join(Replacement, split(String, SearchPattern, Where)).
Example:
1> string:replace(<<"ab..cd..ef">>, "..", "*").
[<<"ab">>,"*",<<"cd..ef">>]
2> string:replace(<<"ab..cd..ef">>, "..", "*", all).
[<<"ab">>,"*",<<"cd">>,"*",<<"ef">>]
reverse(String :: unicode:chardata()) -> [grapheme_cluster()]
Returns the reverse list of the grapheme clusters in String.
Example:
1> Reverse = string:reverse(unicode:characters_to_nfd_binary("")).
[[79,776],[65,776],[65,778]]
2> io:format("~ts~n",[Reverse]).
OAA
slice(String, Start) -> Slice
slice(String, Start, Length) -> Slice
Types:
String = unicode:chardata()
Start = integer() >= 0
Length = infinity | integer() >= 0
Slice = unicode:chardata()
Returns a substring of String of at most Length grapheme clus-
ters, starting at position Start.
By default, Length is infinity.
Example:
1> string:slice(<<"Hell Wrld"/utf8>>, 4).
<<" Wrld"/utf8>>
2> string:slice(["Hell ", <<"Wrld"/utf8>>], 4,4).
" W"
3> string:slice(["Hell ", <<"Wrld"/utf8>>], 4,50).
" Wrld"
split(String, SearchPattern) -> [unicode:chardata()]
split(String, SearchPattern, Where) -> [unicode:chardata()]
Types:
String = SearchPattern = unicode:chardata()
Where = direction() | all
Splits String where SearchPattern is encountered and return the
remaining parts. Where, default leading, indicates whether the
leading, the trailing or all encounters of SearchPattern will
split String.
Example:
0> string:split("ab..bc..cd", "..").
["ab","bc..cd"]
1> string:split(<<"ab..bc..cd">>, "..", trailing).
[<<"ab..bc">>,<<"cd">>]
2> string:split(<<"ab..bc....cd">>, "..", all).
[<<"ab">>,<<"bc">>,<<>>,<<"cd">>]
take(String, Characters) -> {Leading, Trailing}
take(String, Characters, Complement) -> {Leading, Trailing}
take(String, Characters, Complement, Dir) -> {Leading, Trailing}
Types:
String = unicode:chardata()
Characters = [grapheme_cluster()]
Complement = boolean()
Dir = direction()
Leading = Trailing = unicode:chardata()
Takes characters from String as long as the characters are mem-
bers of set Characters or the complement of set Characters. Dir,
which can be leading or trailing, indicates from which direction
characters are to be taken.
Example:
5> string:take("abc0z123", lists:seq($a,$z)).
{"abc","0z123"}
6> string:take(<<"abc0z123">>, lists:seq($0,$9), true, leading).
{<<"abc">>,<<"0z123">>}
7> string:take("abc0z123", lists:seq($0,$9), false, trailing).
{"abc0z","123"}
8> string:take(<<"abc0z123">>, lists:seq($a,$z), true, trailing).
{<<"abc0z">>,<<"123">>}
titlecase(String :: unicode:chardata()) -> unicode:chardata()
Converts String to titlecase.
Example:
1> string:titlecase(" is a SHARP s").
"Ss is a SHARP s"
to_float(String) -> {Float, Rest} | {error, Reason}
Types:
String = unicode:chardata()
Float = float()
Rest = unicode:chardata()
Reason = no_float | badarg
Argument String is expected to start with a valid text repre-
sented float (the digits are ASCII values). Remaining characters
in the string after the float are returned in Rest.
Example:
> {F1,Fs} = string:to_float("1.0-1.0e-1"),
> {F2,[]} = string:to_float(Fs),
> F1+F2.
0.9
> string:to_float("3/2=1.5").
{error,no_float}
> string:to_float("-1.5eX").
{-1.5,"eX"}
to_integer(String) -> {Int, Rest} | {error, Reason}
Types:
String = unicode:chardata()
Int = integer()
Rest = unicode:chardata()
Reason = no_integer | badarg
Argument String is expected to start with a valid text repre-
sented integer (the digits are ASCII values). Remaining charac-
ters in the string after the integer are returned in Rest.
Example:
> {I1,Is} = string:to_integer("33+22"),
> {I2,[]} = string:to_integer(Is),
> I1-I2.
11
> string:to_integer("0.5").
{0,".5"}
> string:to_integer("x=2").
{error,no_integer}
to_graphemes(String :: unicode:chardata()) -> [grapheme_cluster()]
Converts String to a list of grapheme clusters.
Example:
1> string:to_graphemes("e").
[223,8593,[101,778]]
2> string:to_graphemes(<<"e"/utf8>>).
[223,8593,[101,778]]
trim(String) -> unicode:chardata()
trim(String, Dir) -> unicode:chardata()
trim(String, Dir, Characters) -> unicode:chardata()
Types:
String = unicode:chardata()
Dir = direction() | both
Characters = [grapheme_cluster()]
Returns a string, where leading or trailing, or both, Characters
have been removed. Dir which can be leading, trailing, or both,
indicates from which direction characters are to be removed.
Default Characters is the set of nonbreakable whitespace code-
points, defined as Pattern_White_Space in Unicode Standard Annex
#31. By default, Dir is both.
Notice that [$\r,$\n] is one grapheme cluster according to the
Unicode Standard.
Example:
1> string:trim("\t Hello \n").
"Hello"
2> string:trim(<<"\t Hello \n">>, leading).
<<"Hello \n">>
3> string:trim(<<".Hello.\n">>, trailing, "\n.").
<<".Hello">>
uppercase(String :: unicode:chardata()) -> unicode:chardata()
Converts String to uppercase.
See also titlecase/1.
Example:
1> string:uppercase("Micha").
"MICHA"
OBSOLETE API FUNCTIONS
Here follows the function of the old API. These functions only work on
a list of Latin-1 characters.
Note:
The functions are kept for backward compatibility, but are not recom-
mended. They will be deprecated in a future release.
Any undocumented functions in string are not to be used.
EXPORTS
centre(String, Number) -> Centered
centre(String, Number, Character) -> Centered
Types:
String = Centered = string()
Number = integer() >= 0
Character = char()
Returns a string, where String is centered in the string and
surrounded by blanks or Character. The resulting string has
length Number.
This function is obsolete. Use pad/3.
chars(Character, Number) -> String
chars(Character, Number, Tail) -> String
Types:
Character = char()
Number = integer() >= 0
Tail = String = string()
Returns a string consisting of Number characters Character.
Optionally, the string can end with string Tail.
This function is obsolete. Use lists:duplicate/2.
chr(String, Character) -> Index
Types:
String = string()
Character = char()
Index = integer() >= 0
Returns the index of the first occurrence of Character in
String. Returns 0 if Character does not occur.
This function is obsolete. Use find/2.
concat(String1, String2) -> String3
Types:
String1 = String2 = String3 = string()
Concatenates String1 and String2 to form a new string String3,
which is returned.
This function is obsolete. Use [String1, String2] as Data argu-
ment, and call unicode:characters_to_list/2 or unicode:charac-
ters_to_binary/2 to flatten the output.
copies(String, Number) -> Copies
Types:
String = Copies = string()
Number = integer() >= 0
Returns a string containing String repeated Number times.
This function is obsolete. Use lists:duplicate/2.
cspan(String, Chars) -> Length
Types:
String = Chars = string()
Length = integer() >= 0
Returns the length of the maximum initial segment of String,
which consists entirely of characters not from Chars.
This function is obsolete. Use take/3.
Example:
> string:cspan("\t abcdef", " \t").
0
join(StringList, Separator) -> String
Types:
StringList = [string()]
Separator = String = string()
Returns a string with the elements of StringList separated by
the string in Separator.
This function is obsolete. Use lists:join/2.
Example:
> join(["one", "two", "three"], ", ").
"one, two, three"
left(String, Number) -> Left
left(String, Number, Character) -> Left
Types:
String = Left = string()
Number = integer() >= 0
Character = char()
Returns String with the length adjusted in accordance with Num-
ber. The left margin is fixed. If length(String) < Number, then
String is padded with blanks or Characters.
This function is obsolete. Use pad/2 or pad/3.
Example:
> string:left("Hello",10,$.).
"Hello....."
len(String) -> Length
Types:
String = string()
Length = integer() >= 0
Returns the number of characters in String.
This function is obsolete. Use length/1.
rchr(String, Character) -> Index
Types:
String = string()
Character = char()
Index = integer() >= 0
Returns the index of the last occurrence of Character in String.
Returns 0 if Character does not occur.
This function is obsolete. Use find/3.
right(String, Number) -> Right
right(String, Number, Character) -> Right
Types:
String = Right = string()
Number = integer() >= 0
Character = char()
Returns String with the length adjusted in accordance with Num-
ber. The right margin is fixed. If the length of (String) < Num-
ber, then String is padded with blanks or Characters.
This function is obsolete. Use pad/3.
Example:
> string:right("Hello", 10, $.).
".....Hello"
rstr(String, SubString) -> Index
Types:
String = SubString = string()
Index = integer() >= 0
Returns the position where the last occurrence of SubString
begins in String. Returns 0 if SubString does not exist in
String.
This function is obsolete. Use find/3.
Example:
> string:rstr(" Hello Hello World World ", "Hello World").
8
span(String, Chars) -> Length
Types:
String = Chars = string()
Length = integer() >= 0
Returns the length of the maximum initial segment of String,
which consists entirely of characters from Chars.
This function is obsolete. Use take/2.
Example:
> string:span("\t abcdef", " \t").
5
str(String, SubString) -> Index
Types:
String = SubString = string()
Index = integer() >= 0
Returns the position where the first occurrence of SubString
begins in String. Returns 0 if SubString does not exist in
String.
This function is obsolete. Use find/2.
Example:
> string:str(" Hello Hello World World ", "Hello World").
8
strip(String :: string()) -> string()
strip(String, Direction) -> Stripped
strip(String, Direction, Character) -> Stripped
Types:
String = Stripped = string()
Direction = left | right | both
Character = char()
Returns a string, where leading or trailing, or both, blanks or
a number of Character have been removed. Direction, which can be
left, right, or both, indicates from which direction blanks are
to be removed. strip/1 is equivalent to strip(String, both).
This function is obsolete. Use trim/3.
Example:
> string:strip("...Hello.....", both, $.).
"Hello"
sub_string(String, Start) -> SubString
sub_string(String, Start, Stop) -> SubString
Types:
String = SubString = string()
Start = Stop = integer() >= 1
Returns a substring of String, starting at position Start to the
end of the string, or to and including position Stop.
This function is obsolete. Use slice/3.
Example:
sub_string("Hello World", 4, 8).
"lo Wo"
substr(String, Start) -> SubString
substr(String, Start, Length) -> SubString
Types:
String = SubString = string()
Start = integer() >= 1
Length = integer() >= 0
Returns a substring of String, starting at position Start, and
ending at the end of the string or at length Length.
This function is obsolete. Use slice/3.
Example:
> substr("Hello World", 4, 5).
"lo Wo"
sub_word(String, Number) -> Word
sub_word(String, Number, Character) -> Word
Types:
String = Word = string()
Number = integer()
Character = char()
Returns the word in position Number of String. Words are sepa-
rated by blanks or Characters.
This function is obsolete. Use nth_lexeme/3.
Example:
> string:sub_word(" Hello old boy !",3,$o).
"ld b"
to_lower(String) -> Result
to_lower(Char) -> CharResult
to_upper(String) -> Result
to_upper(Char) -> CharResult
Types:
String = Result = io_lib:latin1_string()
Char = CharResult = char()
The specified string or character is case-converted. Notice that
the supported character set is ISO/IEC 8859-1 (also called Latin
1); all values outside this set are unchanged
This function is obsolete use lowercase/1, uppercase/1, title-
case/1 or casefold/1.
tokens(String, SeparatorList) -> Tokens
Types:
String = SeparatorList = string()
Tokens = [Token :: nonempty_string()]
Returns a list of tokens in String, separated by the characters
in SeparatorList.
Example:
> tokens("abc defxxghix jkl", "x ").
["abc", "def", "ghi", "jkl"]
Notice that, as shown in this example, two or more adjacent sep-
arator characters in String are treated as one. That is, there
are no empty strings in the resulting list of tokens.
This function is obsolete. Use lexemes/2.
words(String) -> Count
words(String, Character) -> Count
Types:
String = string()
Character = char()
Count = integer() >= 1
Returns the number of words in String, separated by blanks or
Character.
This function is obsolete. Use lexemes/2.
Example:
> words(" Hello old boy!", $o).
4
NOTES
Some of the general string functions can seem to overlap each other.
The reason is that this string package is the combination of two ear-
lier packages and all functions of both packages have been retained.
Ericsson AB stdlib 3.17 string(3)