Class StringUtil
- java.lang.Object
-
- javacardx.framework.string.StringUtil
-
public final class StringUtil extends Object
This class provides methods for handling UTF-8 encoded character sequences (strings). This class also provides methods to convert UTF-8 encoded character strings to and from other character encodings, such as UCS-2, the GSM 7-bit character set and UTF-16. Support for character encodings other than UTF-8 is optional. Proprietary extensions to the supported character encodings must be identified starting fromPROP_ENCODING_EXT
. An implementation must throw aStringException
with reasonStringException.UNSUPPORTED_ENCODING
if a requested character encoding is not supported.UTF-8 encodes each of the code points in the Unicode character set using one to four 8-bit bytes. Unicode code points are the numerical values that make up the Unicode code space. The Unicode Standard, version 4.0 is available from the Unicode Consortium at http://www.unicode.org. The UTF-8 transformation format is specified by RFC 3629.
The encoded character sequences handled by this class are stored in byte arrays. Each string or character sequence is designated by a byte array, an offset in the byte array indicating the start of the character sequence and a length indicating the length in bytes of the character sequence. If a designated character sequence is outside the bounds of its containing array an
ArrayIndexOutOfBoundsException
is thrown (note: for readablity reasons, these exceptions are assumed and not systematically documented in the methods of this class).
An index of a character or substring within a string is always relative to the begining of the string; that is relative to the offset of the string within the containing byte array. If a provided index of a character or substring within a designated character sequence is outside the bounds of the designated character sequence anIndexOutOfBoundsException
is thrown.This class provides two categories of methods:
- Methods for dealing with plain byte sequences
-
These methods do not check the (UTF-8) well-formedness of the
byte sequences passed as parameters; in particular, they do not check whether the byte at a provided
offset is the first byte of a valid UTF-8 byte sequence or if a provided length
is equal or greater than the length of the UTF-8 byte sequence starting at the provided offset
(as can be determined by its first byte). They only treat UTF-8 byte sequences as plain byte sequences for
the purpose of comparison, copying, truncating... As an example of such methods,
see the
indexOf
method. - Methods for dealing with Unicode code points (i.e., Unicode characters)
-
These methods check the well-formedness of UTF-8 byte sequences. They either
throw a
StringException
with reasonStringException.INVALID_BYTE_SEQUENCE
when encountering an ill-formed UTF-8 byte sequence; as an example of such methods, see theconvertTo
method.
Or they consider any byte sequence that is not well-formed (eg. incomplete) within a designated text range as a code point for the purpose of counting, comparing or returning; as an example of such methods, see thecodePointCount
method.
check
method should be used.Because Unicode case conversion may require locale-sensitive mappings, context-sensitive mappings, and 1:M character mappings, and in order to limit footprint case conversion supported by the methods
toLowerCase
,toUpperCase
andcompare
is only available by default for the Basic Latin Unicode block (US-ASCII character set: U+0000 - U+007F). Other character blocks may be supported.- Since:
- Java Card 3.0.4
-
-
Field Summary
Fields Modifier and Type Field Description static byte
GSM_7
The GSM Septet character encoding.static byte
ISO_8859_1
The ISO 8859-1 (Latin-1) character encoding.static byte
PROP_ENCODING_EXT
Start of proprietary character encoding numbering.static byte
UCS_2
The UCS-2 character encoding.static byte
UTF_16
The UTF-16 character encoding.static byte
UTF_16_BE
The UTF-16BE (Big Endian) character encoding.static byte
UTF_16_LE
The UTF-16LE (Little Endian) character encoding.static byte
UTF_8
The UTF-8 character encoding.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static boolean
check(byte[] aString, short offset, short length)
Checks if the provided byte array contains a valid UTF-8 encoded character or character sequence.static short
codePointAt(byte[] aString, short offset, short length, short index, byte[] dstBuffer, short dstOffset)
Copies to the destination buffer the character (Unicode code point) at the specified index in the UTF-8 encoded character sequence designated byaString
,offset
andlength
.static short
codePointBefore(byte[] aString, short offset, short length, short index, byte[] dstBuffer, short dstOffset)
Copies to the destination buffer the character (Unicode code point) before the specified index in the UTF-8 encoded character sequence designated byaString
,offset
andlength
.static short
codePointCount(byte[] aString, short offset, short length)
Returns the number of characters (Unicode code points) in the UTF-8 encoded character sequence designated byaString
,offset
andlength
.static short
compare(boolean ignoreCase, byte[] aString, short offset, short length, byte[] anotherString, short ooffset, short olength)
Compares two strings lexicographically, optionally ignoring case considerations.static short
convertFrom(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset, byte encoding)
Converts from the specified character encoding to the UTF-8 character encoding all of the characters from the provided source character string and copies them to the provided destination array, starting at the provideddstOffset
.static short
convertTo(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset, byte encoding)
Converts to the specified character encoding all of the characters from the provided source UTF-8 encoded character string and copies them to the provided destination array, starting at the provideddstOffset
.static boolean
endsWith(byte[] aString, short offset, short length, byte[] suffix, short soffset, short slength, short codePointCount)
Tests if the UTF-8 encoded character sequence designated byaString
,offset
andlength
ends with the firstcodePointCount
characters of the character sequence designated bysuffix
,soffset
andslength
.static short
indexOf(byte[] aString, short offset, short length, byte[] subString, short soffset, short slength)
Returns the index within the provided UTF-8 encoded character string of the first occurrence of the specified substring.static short
offsetByCodePoints(byte[] aString, short offset, short length, short index, short codePointOffset)
Returns the byte index within the UTF-8 encoded character sequence designated byaString
,offset
andlength
that is offset from the givenindex
bycodePointOffset
code points.static boolean
parseBoolean(byte[] aString, short offset, short length)
Parses the string argument as a boolean.static short
parseLongInteger(byte[] aString, short offset, short length, short[] integer, short ioffset)
Parses the provided UTF-8 encoded character sequence into a (up-to) 64 bits long signed integer.static short
parseShortInteger(byte[] aString, short offset, short length)
Parses the provided UTF-8 encoded character sequence into the (up-to) 16 bits long signed (short
) integer.static short
replace(byte[] srcString, short srcOffset, short srcLength, byte[] oldSubstring, short oOffset, short oLength, byte[] newSubstring, short nOffset, short nLength, byte[] dstString, short dstOffset)
Copies to the destination byte array the string resulting from replacing all occurrences of the old substring in the provided source string with the new substring.static boolean
startsWith(byte[] aString, short offset, short length, byte[] prefix, short poffset, short plength, short codePointCount)
Tests if the UTF-8 encoded character sequence designated byaString
,offset
andlength
starts with the firstcodePointCount
characters of the character sequence designated byprefix
,poffset
andplength
.static short
substring(byte[] srcString, short srcOffset, short srcLength, short codePointBeginIndex, short codePointEndIndex, byte[] dstString, short dstOffset)
Copies to the destination byte array the specified substring of the designated source string.static short
toLowerCase(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)
Converts to lower case and copies all of the characters from the provided source UTF-8 encoded character string to the provided destination array, starting at the provideddstOffset
.static short
toUpperCase(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)
Converts to upper case and copies all of the characters from the provided source UTF-8 encoded character string to the provided destination array, starting at the provideddstOffset
.static short
trim(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)
Removes white space from both ends of the provided UTF-8 encoded character string and copies the resulting character sequence to the provided destination array, starting at the provideddstOffset
.static short
valueOf(boolean b, byte[] dstString, short dstOffset)
Copies the UTF-8 encoded character string representation of theboolean
argument into the provided destination array, starting at the provided offset.static short
valueOf(short[] l, byte[] dstString, short dstOffset)
Copies the UTF-8 encoded, signed decimal string representation of the (up-to) 64 bits long signed integer argument provided as an array ofshort
integers, into the provided destination array, starting at the provided offset.static short
valueOf(short i, byte[] dstString, short dstOffset)
Copies the UTF-8 encoded, signed decimal string representation of the the (up-to) 16 bits long signed (short
) argument into the provided destination array, starting at the provided offset.
-
-
-
Field Detail
-
UTF_8
public static final byte UTF_8
The UTF-8 character encoding. This character encoding is used for internal representation and handling of character strings.- See Also:
- Constant Field Values
-
UTF_16
public static final byte UTF_16
The UTF-16 character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.- See Also:
- Constant Field Values
-
UTF_16_LE
public static final byte UTF_16_LE
The UTF-16LE (Little Endian) character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.- See Also:
- Constant Field Values
-
UTF_16_BE
public static final byte UTF_16_BE
The UTF-16BE (Big Endian) character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.- See Also:
- Constant Field Values
-
UCS_2
public static final byte UCS_2
The UCS-2 character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.- See Also:
- Constant Field Values
-
GSM_7
public static final byte GSM_7
The GSM Septet character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.- See Also:
- Constant Field Values
-
ISO_8859_1
public static final byte ISO_8859_1
The ISO 8859-1 (Latin-1) character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.- See Also:
- Constant Field Values
-
PROP_ENCODING_EXT
public static final byte PROP_ENCODING_EXT
Start of proprietary character encoding numbering.- See Also:
- Constant Field Values
-
-
Method Detail
-
codePointCount
public static short codePointCount(byte[] aString, short offset, short length)
Returns the number of characters (Unicode code points) in the UTF-8 encoded character sequence designated byaString
,offset
andlength
. Ill-formed or incomplete byte sequences within the text range count as one code point each.- Parameters:
aString
- the byte array containing the UTF-8 encoded character sequence.offset
- the starting offset of the character sequence in the byte array.length
- the length (in bytes) of the contained character sequence.- Returns:
- the number of characters (Unicode code points) in the designated UTF-8 encoded character sequence.
- Throws:
NullPointerException
- ifaString
isnull
.
-
codePointAt
public static short codePointAt(byte[] aString, short offset, short length, short index, byte[] dstBuffer, short dstOffset)
Copies to the destination buffer the character (Unicode code point) at the specified index in the UTF-8 encoded character sequence designated byaString
,offset
andlength
.index
is an index in the byte array relative to the offset of the designated character sequence within the byte array (that is relative tooffset
). The resulting code point copied to the destination array is UTF-8 encoded and is therefore from one to four byte long. Ill-formed or incomplete byte sequences within the text range counting as one code point each, they are returned as-is.- Parameters:
aString
- the byte array containing the UTF-8 encoded reference character sequence.offset
- the starting offset of the reference character sequence inaString
.length
- the length (in bytes) of the reference character sequence.index
- the byte index (relative tooffset
) of the character to be returned.dstBuffer
- the byte array for copying the resulting character.dstOffset
- the starting offset indstBuffer
for copying the UTF-8 byte sequence of the character (Unicode code point) at the specified index.- Returns:
- the number of bytes copied.
- Throws:
IndexOutOfBoundsException
- ifindex
is negative or not less than the length ofaString
.NullPointerException
- ifaString
ordstBuffer
isnull
.
-
codePointBefore
public static short codePointBefore(byte[] aString, short offset, short length, short index, byte[] dstBuffer, short dstOffset)
Copies to the destination buffer the character (Unicode code point) before the specified index in the UTF-8 encoded character sequence designated byaString
,offset
andlength
.index
is an index in the byte array relative to the offset of the designated character sequence within the byte array (that is relative tooffset
). The resulting code point copied to the destination array is UTF-8 encoded and is therefore from one to four byte long. Ill-formed or incomplete byte sequences within the text range counting as one code point each, they are returned as-is.- Parameters:
aString
- the byte array containing the UTF-8 encoded character sequence.offset
- the starting offset of the reference character sequence inaString
.length
- the length (in bytes) of the reference character sequence.index
- the byte index (relative tooffset
) following the character to be returned.dstBuffer
- the byte array for copying the resulting character.dstOffset
- the starting offset indstBuffer
for copying the UTF-8 byte sequence of the character (Unicode code point) before the specified index.- Returns:
- the number of bytes copied.
- Throws:
IndexOutOfBoundsException
- ifindex
is less than 1 or greater than the length ofaString
.NullPointerException
- ifaString
ordstBuffer
isnull
.
-
offsetByCodePoints
public static short offsetByCodePoints(byte[] aString, short offset, short length, short index, short codePointOffset)
Returns the byte index within the UTF-8 encoded character sequence designated byaString
,offset
andlength
that is offset from the givenindex
bycodePointOffset
code points. Ill-formed or incomplete UTF-8 byte sequences within the text range given byindex
andcodePointOffset
count as one code point each.This method can be used to extract a substring from a string. For example, to copy to
buffer
the substring of the stringaString
that begins atcodePointBeginIndex
and extends to the character at indexcodePointEndIndex - 1
, one can call:short beginOffset = StringUtil.offsetByCodePoints(aString, offset, length, (short) 0, codePointBeginIndex); short endOffset = StringUtil.offsetByCodePoints(aString, offset, length, (short) 0, codePointEndIndex); short l = Util.arrayCopy(aString, beginOffset, buffer, 0, (short) (endOffset - beginOffset));
The copied substring thus has a length in codepoints (that is a codepoint count) equal tocodePointEndIndex - codePointbeginIndex
.- Parameters:
aString
- the byte array containing the UTF-8 encoded character sequence.offset
- the starting offset of the reference character sequence inaString
.length
- the length (in bytes) of the reference character sequence.index
- the byte index to be offset (relative tooffset
).codePointOffset
- the offset in code points.- Returns:
- the index within
aString
relative to the begining of the string contained inaString
that is, relative tooffset
. - Throws:
IndexOutOfBoundsException
- ifindex
is negative or larger than the length ofaString
, or ifcodePointOffset
is positive and the substring starting withindex
has fewer thancodePointOffset
code points, or ifcodePointOffset
is negative and the substring beforeindex
has fewer than the absolute value ofcodePointOffset
code points.NullPointerException
- ifaString
isnull
.
-
compare
public static short compare(boolean ignoreCase, byte[] aString, short offset, short length, byte[] anotherString, short ooffset, short olength)
Compares two strings lexicographically, optionally ignoring case considerations. The comparison is based on the UTF-8-encoded Unicode value of each character in the strings. The character sequence designated byaString
,offset
andlength
is compared lexicographically to the character sequence designated byanotherString
,ooffset
andolength
. The result is a negative number if the character sequence contained inaString
lexicographically precedes the character sequence contained inanotherString
. The result is a positive number if the character sequence contained inaString
lexicographically follows the character sequence contained inanotherString
. The result is zero if the two character sequences are equal.This is the definition of lexicographic ordering. If two strings are different, then either they have different characters at some index that is a valid index for both strings, or their lengths are different, or both. If they have different characters at one or more index positions, let k be the smallest such index; then the string whose character at position k has the smaller value, as determined by using the < operator, lexicographically precedes the other string. In this case,
compare
returns the difference of the first mismatching byte of the UTF-8 encode representation of the two character at position k in the two strings. If there is no index position at which they differ, then the shorter string lexicographically precedes the longer string. In this case,compare
returns the difference of the lengths of the strings.When ignoring case considerations, this method behaves as if comparing (using the same algorithm as described above) normalized versions of the strings where case differences have been eliminated by calling
toLowerCase(toUpperCase(string))
on both argument strings.- Parameters:
ignoreCase
- whether case must be ignored.aString
- the byte array containing the reference UTF-8 encoded character sequence.offset
- the starting offset of the reference character sequence inaString
.length
- the length (in bytes) of the reference character sequence.anotherString
- the byte array containing the UTF-8 encoded character sequence to be compared.ooffset
- the starting offset inanotherString
of the character sequence to be compared.olength
- the length (in bytes) of the character sequence to be compared.- Returns:
- the value
0
if the character sequence contained inanotherString
is equal to the character sequence contained inaString
; a value less than0
if the character sequence contained inaString
is lexicographically less than the character sequence contained inanotherString
; and a value greater than0
if the character sequence contained inaString
is lexicographically greater than the character sequence contained inanotherString
, optionally ignoring case considerations.. - Throws:
NullPointerException
- ifaString
oranotherString
isnull
.
-
indexOf
public static short indexOf(byte[] aString, short offset, short length, byte[] subString, short soffset, short slength)
Returns the index within the provided UTF-8 encoded character string of the first occurrence of the specified substring. The number returned is the smallest value k for which:compare(false, aString, offset + k, slength, substring, soffset, slength) == 0
- Parameters:
aString
- the byte array containing the reference UTF-8 encoded character sequence.offset
- the starting offset of the reference character sequence inaString
.length
- the length (in bytes) of the reference character sequence.subString
- the byte array containing the UTF-8 encoded character sequence of the substring.soffset
- the starting offset insubString
of the substring's character sequence.slength
- the length (in bytes) of the substring's character sequence.- Returns:
- if the substring designated by
subString
,soffset
andslength
occurs as a substring within the string designated byaString
,offset
andlength
, then the index (relative tooffset
) of the first byte of the first such substring is returned; if it does not occur as a substring,-1
is returned. - Throws:
NullPointerException
- ifaString
orsubString
isnull
.
-
replace
public static short replace(byte[] srcString, short srcOffset, short srcLength, byte[] oldSubstring, short oOffset, short oLength, byte[] newSubstring, short nOffset, short nLength, byte[] dstString, short dstOffset)
Copies to the destination byte array the string resulting from replacing all occurrences of the old substring in the provided source string with the new substring.If the character sequence (substring) designated by
oldSubstring
,oOffset
andoLength
does not occur in the source character sequence designated bysrcString
,srcOffset
andsrcLength
, then the source character sequence is copied as is todstString
, starting atdstOffset
. Otherwise, a character sequence identical to the character sequence designated bysrcString
,srcOffset
andsrcLength
is copied todstString
, starting atdstOffset
, except that every occurrence of the substring designated byoldSubstring
,oOffset
andoLength
is replaced by an occurrence of the substring designated bynewSubstring
,nOffset
andnLength
. The replacement proceeds from the beginning of the source string to the end.- Parameters:
srcString
- the byte array containing the source UTF-8 encoded character sequence.srcOffset
- the starting offset of the source character sequence insrcString
.srcLength
- the length (in bytes) of the source character sequence.oldSubstring
- the byte array containing the UTF-8 encoded character sequence to be replaced.oOffset
- the starting offset of the replaced character sequence inoldSubstring
.oLength
- the length (in bytes) of the replaced character sequence.newSubstring
- the byte array containing the replacement UTF-8 encoded character sequence.nOffset
- the starting offset of the replacement character sequence innewSubstring
.nLength
- the length (in bytes) of the replacement character sequence.dstString
- the byte array for copying the resulting character sequence.dstOffset
- the starting offset indstString
for copying the resulting character sequence.- Returns:
- the number of bytes copied.
- Throws:
NullPointerException
- ifsrcString
,oldSubstring
,newSubstring
ordstString
isnull
.
-
toLowerCase
public static short toLowerCase(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)
Converts to lower case and copies all of the characters from the provided source UTF-8 encoded character string to the provided destination array, starting at the provideddstOffset
. This method skips/ignores any unrecognized (ill-formed or incomplete) byte sequence.- Parameters:
srcString
- the byte array containing the source UTF-8 encoded character sequence.srcOffset
- the starting offset of the source character sequence insrcString
.srcLength
- the length (in bytes) of the source character sequence.dstString
- the byte array for copying the resulting character sequence.dstOffset
- the starting offset indstString
for copying the resulting character sequence.- Returns:
- the number of bytes copied.
- Throws:
NullPointerException
- ifsrcString
ordstString
isnull
.- See Also:
toUpperCase(byte[], short, short, byte[], short)
-
toUpperCase
public static short toUpperCase(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)
Converts to upper case and copies all of the characters from the provided source UTF-8 encoded character string to the provided destination array, starting at the provideddstOffset
. This method skips/ignores any unrecognized (ill-formed or incomplete) byte sequence.- Parameters:
srcString
- the byte array containing the source UTF-8 encoded character sequence.srcOffset
- the starting offset of the source character sequence insrcString
.srcLength
- the length (in bytes) of the source character sequence.dstString
- the byte array for copying the resulting character sequence.dstOffset
- the starting offset indstString
for copying the resulting character sequence.- Returns:
- the number of bytes copied.
- Throws:
NullPointerException
- ifsrcString
ordstString
isnull
.- See Also:
toLowerCase(byte[], short, short, byte[], short)
-
trim
public static short trim(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)
Removes white space from both ends of the provided UTF-8 encoded character string and copies the resulting character sequence to the provided destination array, starting at the provideddstOffset
.If the source string designated by
srcString
,srcOffset
andsrcLength
represents an empty character sequence, or the first and last characters of character sequence of the source string both have codes greater than'\u0020'
(the space character), then the source string is copied as is todstString
, starting atdstOffset
.Otherwise, if there is no character with a code greater than
'\u0020'
in the source string, then no character is copied and 0 is returned.Otherwise, let k be the index of the first character in the source string whose code is greater than
'\u0020'
, and let m be the index of the last character in the source string whose code is greater than'\u0020'
. The substring of the source string that begins with the character at index k and ends with the character at index m is copied todstString
, starting atdstOffset
.This method may be used to trim whitespace from the beginning and end of a string; in fact, it trims all ASCII control characters as well.
Illegal byte sequences are considered as non-white spaces.
- Parameters:
srcString
- the byte array containing the source UTF-8 encoded character sequence.srcOffset
- the starting offset of the source character sequence insrcString
.srcLength
- the length (in bytes) of the source character sequence.dstString
- the byte array for copying the resulting character sequence.dstOffset
- the starting offset indstString
for copying the resulting character sequence.- Returns:
- the number of bytes copied.
- Throws:
NullPointerException
- ifsrcString
ordstString
isnull
.
-
valueOf
public static short valueOf(boolean b, byte[] dstString, short dstOffset)
Copies the UTF-8 encoded character string representation of theboolean
argument into the provided destination array, starting at the provided offset. If the argument istrue
, a string equal to"true"
is copied; otherwise, a string equal to"false"
is copied.- Parameters:
b
- aboolean
.dstString
- the destination UTF-8 encoded character string, as a byte arraydstOffset
- the starting offset in the destination array- Returns:
- the number of bytes copied.
- Throws:
NullPointerException
- ifdstString
isnull
.
-
parseBoolean
public static boolean parseBoolean(byte[] aString, short offset, short length)
Parses the string argument as a boolean. Theboolean
returned represents the valuetrue
if the string argument is notnull
and is equal, ignoring case, to the string"true"
.- Parameters:
aString
- the byte array containing the UTF-8 encoded character sequence to be parsed.offset
- the starting offset of the character sequence inaString
.length
- the length (in bytes) of the character sequence to be parsed.- Returns:
- the boolean value represented by the designated character sequence.
- Throws:
NullPointerException
- ifaString
isnull
.
-
valueOf
public static short valueOf(short i, byte[] dstString, short dstOffset)
Copies the UTF-8 encoded, signed decimal string representation of the the (up-to) 16 bits long signed (short
) argument into the provided destination array, starting at the provided offset.- Parameters:
i
- ashort
.dstString
- the destination UTF-8 encoded character string, as a byte arraydstOffset
- the starting offset in the destination array- Returns:
- the number of bytes copied.
- Throws:
NullPointerException
- ifdstString
isnull
.
-
parseShortInteger
public static short parseShortInteger(byte[] aString, short offset, short length)
Parses the provided UTF-8 encoded character sequence into the (up-to) 16 bits long signed (short
) integer. Accepts decimal and hexadecimal numbers given by the following grammar:- DecodableString:
- Signopt DecimalNumeral
- Signopt
0x
HexDigits- Signopt
0X
HexDigits- Signopt
#
HexDigits
- Signopt
- Sign:
-
The sequence of characters following an (optional) negative sign and/or radix specifier ("
0x
", "0X
", "#
", or leading zero) is parsed as a (short
) integer in the specified radix (10, or 16). This sequence of characters must represent a positive value or aStringException
will be thrown with reasonStringException.ILLEGAL_NUMBER_FORMAT
. The result is negated if first character of the specified character string is the minus sign. No whitespace characters are permitted in the character string.- Parameters:
aString
- the byte array containing the UTF-8 encoded character sequence to be parsed.offset
- the starting offset of the character sequence inaString
.length
- the length (in bytes) of the character sequence to be parsed.- Returns:
- the
short
integer value represented by the designated character sequence. - Throws:
StringException
- if the designated character sequence does not contain a parsable (short
) integer.NullPointerException
- ifaString
isnull
.
-
valueOf
public static short valueOf(short[] l, byte[] dstString, short dstOffset)
Copies the UTF-8 encoded, signed decimal string representation of the (up-to) 64 bits long signed integer argument provided as an array ofshort
integers, into the provided destination array, starting at the provided offset.- Parameters:
l
- an array ofshort
integers representing up to a 64bits signed long integer; the most significantshort
integer is at index0
.dstString
- the destination UTF-8 encoded character string, as a byte arraydstOffset
- the starting offset in the destination array- Returns:
- the number of bytes copied.
- Throws:
NullPointerException
- ifl
ordstString
isnull
.
-
parseLongInteger
public static short parseLongInteger(byte[] aString, short offset, short length, short[] integer, short ioffset)
Parses the provided UTF-8 encoded character sequence into a (up-to) 64 bits long signed integer. Accepts decimal and hexadecimal numbers given by the following grammar:- DecodableString:
- Signopt DecimalNumeral
- Signopt
0x
HexDigits- Signopt
0X
HexDigits- Signopt
#
HexDigits
- Signopt
- Sign:
-
The sequence of characters following an (optional) negative sign and/or radix specifier ("
0x
", "0X
", "#
", or leading zero) is parsed as a (long) integer in the specified radix (10 or 16). This sequence of characters must represent a positive value or aStringException
will be thrown with reasonStringException.ILLEGAL_NUMBER_FORMAT
. The result is negated if first character of the specified character string is the minus sign. No whitespace characters are permitted in the character string.- Parameters:
aString
- the byte array containing the UTF-8 encoded character sequence to be parsed.offset
- the starting offset of the character sequence inaString
.length
- the length (in bytes) of the character sequence to be parsed.integer
- the array ofshort
integers to contained the value represented by the designated character sequence; the most significantshort
integer is at index0
.ioffset
- the starting offset ininteger
for copying the resulting short sequence.- Returns:
- the number of
short
integers written into the array, ignoring leading zeroshort values. - Throws:
StringException
- the designated character sequence does not contain a parsable (long) integer.NullPointerException
- ifaString
orinteger
isnull
.
-
convertTo
public static short convertTo(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset, byte encoding)
Converts to the specified character encoding all of the characters from the provided source UTF-8 encoded character string and copies them to the provided destination array, starting at the provideddstOffset
.- Parameters:
srcString
- the byte array containing the source UTF-8 encoded character sequence.srcOffset
- the starting offset of the source character sequence insrcString
.srcLength
- the length (in bytes) of the source character sequence.dstString
- the byte array for copying the resulting character sequence.dstOffset
- the starting offset indstString
for copying the resulting character sequence.encoding
- the character encoding to be used.- Returns:
- the number of bytes copied.
- Throws:
StringException
- with reasonStringException.UNSUPPORTED_ENCODING
if the requested character encoding is not supported.StringException
- with reasonStringException.INVALID_BYTE_SEQUENCE
if an invalid byte sequence is encountered.NullPointerException
- ifsrcString
ordstString
isnull
.- See Also:
convertFrom(byte[], short, short, byte[], short, byte)
-
convertFrom
public static short convertFrom(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset, byte encoding)
Converts from the specified character encoding to the UTF-8 character encoding all of the characters from the provided source character string and copies them to the provided destination array, starting at the provideddstOffset
.- Parameters:
srcString
- the byte array containing the source character sequence encoded in the character encoding designated byencoding
.srcOffset
- the starting offset of the source character sequence insrcString
.srcLength
- the length (in bytes) of the source character sequence.dstString
- the byte array for copying the UTF-8 encoded resulting character sequence.dstOffset
- the starting offset indstString
for copying the resulting character sequence.encoding
- the character encoding of the source character string.- Returns:
- the number of bytes copied.
- Throws:
StringException
- with reasonStringException.UNSUPPORTED_ENCODING
if the requested character encoding is not supported.StringException
- with reasonStringException.INVALID_BYTE_SEQUENCE
if an invalid byte sequence is encountered.NullPointerException
- ifsrcString
ordstString
isnull
.- See Also:
convertTo(byte[], short, short, byte[], short, byte)
-
check
public static boolean check(byte[] aString, short offset, short length)
Checks if the provided byte array contains a valid UTF-8 encoded character or character sequence. As per UTF-8, a byte with a leading '0' bit is a single-byte code; a byte with leading '1' bits is the first byte of a multi-byte sequence whose length is equals to number of leading '1' bits; finally, a byte with a leading '10' bit sequence is a continuation byte of a multi-byte sequence.- Parameters:
aString
- the byte array containing the UTF-8 encoded character sequence to be checked.offset
- the starting offset of the character sequence insrcString
.length
- the length (in bytes) of the character sequence to be checked.- Returns:
- true, if the byte sequence corresponds to a valid UTF-8 encoded character or character sequence, false otherwise.
- Throws:
NullPointerException
- ifaString
isnull
.
-
startsWith
public static boolean startsWith(byte[] aString, short offset, short length, byte[] prefix, short poffset, short plength, short codePointCount)
Tests if the UTF-8 encoded character sequence designated byaString
,offset
andlength
starts with the firstcodePointCount
characters of the character sequence designated byprefix
,poffset
andplength
.If
codePointCount
is negative, the whole prefix character sequence is considered; in which case calling this method is equivalent to callingarrayCompare
as follows:return length >= plength && arrayCompare(aString, offset, prefix, poffset, plength) == 0;
Otherwise ifcodePointCount
is positive, calling this method is equivalent to callingarrayCompare
as follows:short endOffset = StringUtil.offsetByCodePoints(prefix, poffset, plength, 0, codePointCount); return length >= endOffset && arrayCompare(aString, offset, prefix, poffset, endOffset) == 0;
- Parameters:
aString
- the byte array containing the reference UTF-8 encoded character sequence.offset
- the starting offset of the reference character sequence inaString
.length
- the length (in bytes) of the reference character sequence.prefix
- the byte array containing the prefixing UTF-8 encoded character sequence.poffset
- the starting offset inprefix
of the prefixing character sequence.plength
- the length (in bytes) of the prefixing character sequence.codePointCount
- the number of code points to be used for testing.- Returns:
true
if the character sequence designated byprefix
,poffset
andplength
is a prefix of the character sequence designated byaString
,offset
andlength
;false
otherwise.- Throws:
NullPointerException
- ifaString
orprefix
isnull
.
-
endsWith
public static boolean endsWith(byte[] aString, short offset, short length, byte[] suffix, short soffset, short slength, short codePointCount)
Tests if the UTF-8 encoded character sequence designated byaString
,offset
andlength
ends with the firstcodePointCount
characters of the character sequence designated bysuffix
,soffset
andslength
.If
codePointCount
is negative, the whole suffix character sequence is considered; in which case calling this method is equivalent to callingarrayComapre
as follows:return length >= slength && arrayCompare(aString, (short) (offset + length - slength), suffix, soffset, slength) == 0;
Otherwise ifcodePointCount
is positive, calling this method is equivalent to callingarrayCompare
as follows:short endOffset = StringUtil.offsetByCodePoints(suffix, soffset, slength, 0, codePointCount); return length >= endOffset && arrayCompare(aString, (short) (offset + length - endOffset), suffix, soffset, endOffset) == 0;
- Parameters:
aString
- the byte array containing the reference UTF-8 encoded character sequence.offset
- the starting offset of the reference character sequence inaString
.length
- the length (in bytes) of the reference character sequence.suffix
- the byte array containing the suffixing UTF-8 encoded character sequence.soffset
- the starting offset insuffix
of the suffixing character sequence.slength
- the length (in bytes) of the suffixing character sequence.codePointCount
- the number of code points to be used for testing.- Returns:
true
if the character sequence designated bysuffix
,soffset
andslength
is a suffix of the character sequence designated byaString
,offset
andlength
;false
otherwise.- Throws:
NullPointerException
- ifaString
orsuffix
isnull
.
-
substring
public static short substring(byte[] srcString, short srcOffset, short srcLength, short codePointBeginIndex, short codePointEndIndex, byte[] dstString, short dstOffset)
Copies to the destination byte array the specified substring of the designated source string. The substring begins at the specifiedcodePointbeginIndex
and extends to the character at indexcodePointEndIndex - 1
. Thus the length of the substring in codepoints (that is its codepoint count) iscodePointEndIndex - codePointbeginIndex
.Ill-formed or incomplete byte sequences within the text range count as one code point each.
IfcodePointEndIndex
is negative, then the whole remaining character sequence from the source string is considered.- Parameters:
srcString
- the byte array containing the source UTF-8 encoded character sequence.srcOffset
- the starting offset of the source character sequence insrcString
.srcLength
- the length (in bytes) of the source character sequence.codePointBeginIndex
- the beginning index (relative tosrcOffset
), inclusive.codePointEndIndex
- the ending index (relative tosrcOffset
), exclusive.dstString
- the byte array for copying the resulting character sequence.dstOffset
- the starting offset indstString
for copying the resulting character sequence.- Returns:
- the number of bytes copied.
- Throws:
NullPointerException
- ifsrcString
ordstString
isnull
.
-
-