Class StringUtil
- java.lang.Object
-
- javacardx.framework.string.StringUtil
-
public final class StringUtil extends Object
This class provides methods for handling UTF-8 encoded character sequences (strings). This class also provides methods to convert UTF-8 encoded character strings to and from other character encodings, such as UCS-2, the GSM 7-bit character set and UTF-16. Support for character encodings other than UTF-8 is optional. Proprietary extensions to the supported character encodings must be identified starting fromPROP_ENCODING_EXT. An implementation must throw aStringExceptionwith reasonStringException.UNSUPPORTED_ENCODINGif a requested character encoding is not supported.UTF-8 encodes each of the code points in the Unicode character set using one to four 8-bit bytes. Unicode code points are the numerical values that make up the Unicode code space. The Unicode Standard, version 4.0 is available from the Unicode Consortium at http://www.unicode.org. The UTF-8 transformation format is specified by RFC 3629.
The encoded character sequences handled by this class are stored in byte arrays. Each string or character sequence is designated by a byte array, an offset in the byte array indicating the start of the character sequence and a length indicating the length in bytes of the character sequence. If a designated character sequence is outside the bounds of its containing array an
ArrayIndexOutOfBoundsExceptionis thrown (note: for readability reasons, these exceptions are assumed and not systematically documented in the methods of this class).
An index of a character or substring within a string is always relative to the beginning of the string; that is relative to the offset of the string within the containing byte array. If a provided index of a character or substring within a designated character sequence is outside the bounds of the designated character sequence anIndexOutOfBoundsExceptionis thrown.This class provides two categories of methods:
- Methods for dealing with plain byte sequences
-
These methods do not check the (UTF-8) well-formedness of the
byte sequences passed as parameters; in particular, they do not check whether the byte at a provided
offset is the first byte of a valid UTF-8 byte sequence or if a provided length
is equal or greater than the length of the UTF-8 byte sequence starting at the provided offset
(as can be determined by its first byte). They only treat UTF-8 byte sequences as plain byte sequences for
the purpose of comparison, copying, truncating... As an example of such methods,
see the
indexOfmethod. - Methods for dealing with Unicode code points (i.e., Unicode characters)
-
These methods check the well-formedness of UTF-8 byte sequences. They either
throw a
StringExceptionwith reasonStringException.INVALID_BYTE_SEQUENCEwhen encountering an ill-formed UTF-8 byte sequence; as an example of such methods, see theconvertTomethod.
Or they consider any byte sequence that is not well-formed (eg. incomplete) within a designated text range as a code point for the purpose of counting, comparing or returning; as an example of such methods, see thecodePointCountmethod.
checkmethod should be used.Because Unicode case conversion may require locale-sensitive mappings, context-sensitive mappings, and 1:M character mappings, and in order to limit footprint case conversion supported by the methods
toLowerCase,toUpperCaseandcompareis only available by default for the Basic Latin Unicode block (US-ASCII character set: U+0000 - U+007F). Other character blocks may be supported.- Since:
- Java Card 3.0.4
-
-
Field Summary
Fields Modifier and Type Field Description static byteGSM_7The GSM Septet character encoding.static byteISO_8859_1The ISO 8859-1 (Latin-1) character encoding.static bytePROP_ENCODING_EXTStart of proprietary character encoding numbering.static byteUCS_2The UCS-2 character encoding.static byteUTF_16The UTF-16 character encoding.static byteUTF_16_BEThe UTF-16BE (Big Endian) character encoding.static byteUTF_16_LEThe UTF-16LE (Little Endian) character encoding.static byteUTF_8The UTF-8 character encoding.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static booleancheck(byte[] aString, short offset, short length)Checks if the provided byte array contains a valid UTF-8 encoded character or character sequence.static shortcodePointAt(byte[] aString, short offset, short length, short index, byte[] dstBuffer, short dstOffset)Copies to the destination buffer the character (Unicode code point) at the specified index in the UTF-8 encoded character sequence designated byaString,offsetandlength.static shortcodePointBefore(byte[] aString, short offset, short length, short index, byte[] dstBuffer, short dstOffset)Copies to the destination buffer the character (Unicode code point) before the specified index in the UTF-8 encoded character sequence designated byaString,offsetandlength.static shortcodePointCount(byte[] aString, short offset, short length)Returns the number of characters (Unicode code points) in the UTF-8 encoded character sequence designated byaString,offsetandlength.static shortcompare(boolean ignoreCase, byte[] aString, short offset, short length, byte[] anotherString, short ooffset, short olength)Compares two strings lexicographically, optionally ignoring case considerations.static shortconvertFrom(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset, byte encoding)Converts from the specified character encoding to the UTF-8 character encoding all of the characters from the provided source character string and copies them to the provided destination array, starting at the provideddstOffset.static shortconvertTo(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset, byte encoding)Converts to the specified character encoding all of the characters from the provided source UTF-8 encoded character string and copies them to the provided destination array, starting at the provideddstOffset.static booleanendsWith(byte[] aString, short offset, short length, byte[] suffix, short soffset, short slength, short codePointCount)Tests if the UTF-8 encoded character sequence designated byaString,offsetandlengthends with the firstcodePointCountcharacters of the character sequence designated bysuffix,soffsetandslength.static shortindexOf(byte[] aString, short offset, short length, byte[] subString, short soffset, short slength)Returns the index within the provided UTF-8 encoded character string of the first occurrence of the specified substring.static shortoffsetByCodePoints(byte[] aString, short offset, short length, short index, short codePointOffset)Returns the byte index within the UTF-8 encoded character sequence designated byaString,offsetandlengththat is offset from the givenindexbycodePointOffsetcode points.static booleanparseBoolean(byte[] aString, short offset, short length)Parses the string argument as a boolean.static shortparseLongInteger(byte[] aString, short offset, short length, short[] integer, short ioffset)Parses the provided UTF-8 encoded character sequence into a (up-to) 64 bits long signed integer.static shortparseShortInteger(byte[] aString, short offset, short length)Parses the provided UTF-8 encoded character sequence into the (up-to) 16 bits long signed (short) integer.static shortreplace(byte[] srcString, short srcOffset, short srcLength, byte[] oldSubstring, short oOffset, short oLength, byte[] newSubstring, short nOffset, short nLength, byte[] dstString, short dstOffset)Copies to the destination byte array the string resulting from replacing all occurrences of the old substring in the provided source string with the new substring.static booleanstartsWith(byte[] aString, short offset, short length, byte[] prefix, short poffset, short plength, short codePointCount)Tests if the UTF-8 encoded character sequence designated byaString,offsetandlengthstarts with the firstcodePointCountcharacters of the character sequence designated byprefix,poffsetandplength.static shortsubstring(byte[] srcString, short srcOffset, short srcLength, short codePointBeginIndex, short codePointEndIndex, byte[] dstString, short dstOffset)Copies to the destination byte array the specified substring of the designated source string.static shorttoLowerCase(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)Converts to lower case and copies all of the characters from the provided source UTF-8 encoded character string to the provided destination array, starting at the provideddstOffset.static shorttoUpperCase(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)Converts to upper case and copies all of the characters from the provided source UTF-8 encoded character string to the provided destination array, starting at the provideddstOffset.static shorttrim(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)Removes white space from both ends of the provided UTF-8 encoded character string and copies the resulting character sequence to the provided destination array, starting at the provideddstOffset.static shortvalueOf(boolean b, byte[] dstString, short dstOffset)Copies the UTF-8 encoded character string representation of thebooleanargument into the provided destination array, starting at the provided offset.static shortvalueOf(short[] l, byte[] dstString, short dstOffset)Copies the UTF-8 encoded, signed decimal string representation of the (up-to) 64 bits long signed integer argument provided as an array ofshortintegers, into the provided destination array, starting at the provided offset.static shortvalueOf(short i, byte[] dstString, short dstOffset)Copies the UTF-8 encoded, signed decimal string representation of the the (up-to) 16 bits long signed (short) argument into the provided destination array, starting at the provided offset.
-
-
-
Field Detail
-
UTF_8
public static final byte UTF_8
The UTF-8 character encoding. This character encoding is used for internal representation and handling of character strings.- See Also:
- Constant Field Values
-
UTF_16
public static final byte UTF_16
The UTF-16 character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.- See Also:
- Constant Field Values
-
UTF_16_LE
public static final byte UTF_16_LE
The UTF-16LE (Little Endian) character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.- See Also:
- Constant Field Values
-
UTF_16_BE
public static final byte UTF_16_BE
The UTF-16BE (Big Endian) character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.- See Also:
- Constant Field Values
-
UCS_2
public static final byte UCS_2
The UCS-2 character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.- See Also:
- Constant Field Values
-
GSM_7
public static final byte GSM_7
The GSM Septet character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.- See Also:
- Constant Field Values
-
ISO_8859_1
public static final byte ISO_8859_1
The ISO 8859-1 (Latin-1) character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.- See Also:
- Constant Field Values
-
PROP_ENCODING_EXT
public static final byte PROP_ENCODING_EXT
Start of proprietary character encoding numbering.- See Also:
- Constant Field Values
-
-
Method Detail
-
codePointCount
public static short codePointCount(byte[] aString, short offset, short length)Returns the number of characters (Unicode code points) in the UTF-8 encoded character sequence designated byaString,offsetandlength. Ill-formed or incomplete byte sequences within the text range count as one code point each.- Parameters:
aString- the byte array containing the UTF-8 encoded character sequence.offset- the starting offset of the character sequence in the byte array.length- the length (in bytes) of the contained character sequence.- Returns:
- the number of characters (Unicode code points) in the designated UTF-8 encoded character sequence.
- Throws:
NullPointerException- ifaStringisnull.
-
codePointAt
public static short codePointAt(byte[] aString, short offset, short length, short index, byte[] dstBuffer, short dstOffset)Copies to the destination buffer the character (Unicode code point) at the specified index in the UTF-8 encoded character sequence designated byaString,offsetandlength.indexis an index in the byte array relative to the offset of the designated character sequence within the byte array (that is relative tooffset). The resulting code point copied to the destination array is UTF-8 encoded and is therefore from one to four byte long. Ill-formed or incomplete byte sequences within the text range counting as one code point each, they are returned as-is.- Parameters:
aString- the byte array containing the UTF-8 encoded reference character sequence.offset- the starting offset of the reference character sequence inaString.length- the length (in bytes) of the reference character sequence.index- the byte index (relative tooffset) of the character to be returned.dstBuffer- the byte array for copying the resulting character.dstOffset- the starting offset indstBufferfor copying the UTF-8 byte sequence of the character (Unicode code point) at the specified index.- Returns:
- the number of bytes copied.
- Throws:
IndexOutOfBoundsException- ifindexis negative or not less than the length ofaString.NullPointerException- ifaStringordstBufferisnull.
-
codePointBefore
public static short codePointBefore(byte[] aString, short offset, short length, short index, byte[] dstBuffer, short dstOffset)Copies to the destination buffer the character (Unicode code point) before the specified index in the UTF-8 encoded character sequence designated byaString,offsetandlength.indexis an index in the byte array relative to the offset of the designated character sequence within the byte array (that is relative tooffset). The resulting code point copied to the destination array is UTF-8 encoded and is therefore from one to four byte long. Ill-formed or incomplete byte sequences within the text range counting as one code point each, they are returned as-is.- Parameters:
aString- the byte array containing the UTF-8 encoded character sequence.offset- the starting offset of the reference character sequence inaString.length- the length (in bytes) of the reference character sequence.index- the byte index (relative tooffset) following the character to be returned.dstBuffer- the byte array for copying the resulting character.dstOffset- the starting offset indstBufferfor copying the UTF-8 byte sequence of the character (Unicode code point) before the specified index.- Returns:
- the number of bytes copied.
- Throws:
IndexOutOfBoundsException- ifindexis less than 1 or greater than the length ofaString.NullPointerException- ifaStringordstBufferisnull.
-
offsetByCodePoints
public static short offsetByCodePoints(byte[] aString, short offset, short length, short index, short codePointOffset)Returns the byte index within the UTF-8 encoded character sequence designated byaString,offsetandlengththat is offset from the givenindexbycodePointOffsetcode points. Ill-formed or incomplete UTF-8 byte sequences within the text range given byindexandcodePointOffsetcount as one code point each.This method can be used to extract a substring from a string. For example, to copy to
bufferthe substring of the stringaStringthat begins atcodePointBeginIndexand extends to the character at indexcodePointEndIndex - 1, one can call:short beginOffset = StringUtil.offsetByCodePoints(aString, offset, length, (short) 0, codePointBeginIndex); short endOffset = StringUtil.offsetByCodePoints(aString, offset, length, (short) 0, codePointEndIndex); short l = Util.arrayCopy(aString, beginOffset, buffer, 0, (short) (endOffset - beginOffset));
The copied substring thus has a length in codepoints (that is a codepoint count) equal tocodePointEndIndex - codePointbeginIndex.- Parameters:
aString- the byte array containing the UTF-8 encoded character sequence.offset- the starting offset of the reference character sequence inaString.length- the length (in bytes) of the reference character sequence.index- the byte index to be offset (relative tooffset).codePointOffset- the offset in code points.- Returns:
- the index within
aStringrelative to the begining of the string contained inaStringthat is, relative tooffset. - Throws:
IndexOutOfBoundsException- ifindexis negative or larger than the length ofaString, or ifcodePointOffsetis positive and the substring starting withindexhas fewer thancodePointOffsetcode points, or ifcodePointOffsetis negative and the substring beforeindexhas fewer than the absolute value ofcodePointOffsetcode points.NullPointerException- ifaStringisnull.
-
compare
public static short compare(boolean ignoreCase, byte[] aString, short offset, short length, byte[] anotherString, short ooffset, short olength)Compares two strings lexicographically, optionally ignoring case considerations. The comparison is based on the UTF-8-encoded Unicode value of each character in the strings. The character sequence designated byaString,offsetandlengthis compared lexicographically to the character sequence designated byanotherString,ooffsetandolength. The result is a negative number if the character sequence contained inaStringlexicographically precedes the character sequence contained inanotherString. The result is a positive number if the character sequence contained inaStringlexicographically follows the character sequence contained inanotherString. The result is zero if the two character sequences are equal.This is the definition of lexicographic ordering. If two strings are different, then either they have different characters at some index that is a valid index for both strings, or their lengths are different, or both. If they have different characters at one or more index positions, let k be the smallest such index; then the string whose character at position k has the smaller value, as determined by using the < operator, lexicographically precedes the other string. In this case,
comparereturns the difference of the first mismatching byte of the UTF-8 encode representation of the two character at position k in the two strings. If there is no index position at which they differ, then the shorter string lexicographically precedes the longer string. In this case,comparereturns the difference of the lengths of the strings.When ignoring case considerations, this method behaves as if comparing (using the same algorithm as described above) normalized versions of the strings where case differences have been eliminated by calling
toLowerCase(toUpperCase(string))on both argument strings.- Parameters:
ignoreCase- whether case must be ignored.aString- the byte array containing the reference UTF-8 encoded character sequence.offset- the starting offset of the reference character sequence inaString.length- the length (in bytes) of the reference character sequence.anotherString- the byte array containing the UTF-8 encoded character sequence to be compared.ooffset- the starting offset inanotherStringof the character sequence to be compared.olength- the length (in bytes) of the character sequence to be compared.- Returns:
- the value
0if the character sequence contained inanotherStringis equal to the character sequence contained inaString; a value less than0if the character sequence contained inaStringis lexicographically less than the character sequence contained inanotherString; and a value greater than0if the character sequence contained inaStringis lexicographically greater than the character sequence contained inanotherString, optionally ignoring case considerations.. - Throws:
NullPointerException- ifaStringoranotherStringisnull.
-
indexOf
public static short indexOf(byte[] aString, short offset, short length, byte[] subString, short soffset, short slength)Returns the index within the provided UTF-8 encoded character string of the first occurrence of the specified substring. The number returned is the smallest value k for which:
If no such value of k exists, then -1 is returned.compare(false, aString, offset + k, slength, substring, soffset, slength) == 0
- Parameters:
aString- the byte array containing the reference UTF-8 encoded character sequence.offset- the starting offset of the reference character sequence inaString.length- the length (in bytes) of the reference character sequence.subString- the byte array containing the UTF-8 encoded character sequence of the substring.soffset- the starting offset insubStringof the substring's character sequence.slength- the length (in bytes) of the substring's character sequence.- Returns:
- if the substring designated by
subString,soffsetandslengthoccurs as a substring within the string designated byaString,offsetandlength, then the index (relative tooffset) of the first byte of the first such substring is returned; if it does not occur as a substring,-1is returned. - Throws:
NullPointerException- ifaStringorsubStringisnull.
-
replace
public static short replace(byte[] srcString, short srcOffset, short srcLength, byte[] oldSubstring, short oOffset, short oLength, byte[] newSubstring, short nOffset, short nLength, byte[] dstString, short dstOffset)Copies to the destination byte array the string resulting from replacing all occurrences of the old substring in the provided source string with the new substring.If the character sequence (substring) designated by
oldSubstring,oOffsetandoLengthdoes not occur in the source character sequence designated bysrcString,srcOffsetandsrcLength, then the source character sequence is copied as is todstString, starting atdstOffset. Otherwise, a character sequence identical to the character sequence designated bysrcString,srcOffsetandsrcLengthis copied todstString, starting atdstOffset, except that every occurrence of the substring designated byoldSubstring,oOffsetandoLengthis replaced by an occurrence of the substring designated bynewSubstring,nOffsetandnLength. The replacement proceeds from the beginning of the source string to the end.- Parameters:
srcString- the byte array containing the source UTF-8 encoded character sequence.srcOffset- the starting offset of the source character sequence insrcString.srcLength- the length (in bytes) of the source character sequence.oldSubstring- the byte array containing the UTF-8 encoded character sequence to be replaced.oOffset- the starting offset of the replaced character sequence inoldSubstring.oLength- the length (in bytes) of the replaced character sequence.newSubstring- the byte array containing the replacement UTF-8 encoded character sequence.nOffset- the starting offset of the replacement character sequence innewSubstring.nLength- the length (in bytes) of the replacement character sequence.dstString- the byte array for copying the resulting character sequence.dstOffset- the starting offset indstStringfor copying the resulting character sequence.- Returns:
- the number of bytes copied.
- Throws:
NullPointerException- ifsrcString,oldSubstring,newSubstringordstStringisnull.
-
toLowerCase
public static short toLowerCase(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)Converts to lower case and copies all of the characters from the provided source UTF-8 encoded character string to the provided destination array, starting at the provideddstOffset. This method skips/ignores any unrecognized (ill-formed or incomplete) byte sequence.- Parameters:
srcString- the byte array containing the source UTF-8 encoded character sequence.srcOffset- the starting offset of the source character sequence insrcString.srcLength- the length (in bytes) of the source character sequence.dstString- the byte array for copying the resulting character sequence.dstOffset- the starting offset indstStringfor copying the resulting character sequence.- Returns:
- the number of bytes copied.
- Throws:
NullPointerException- ifsrcStringordstStringisnull.- See Also:
toUpperCase(byte[], short, short, byte[], short)
-
toUpperCase
public static short toUpperCase(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)Converts to upper case and copies all of the characters from the provided source UTF-8 encoded character string to the provided destination array, starting at the provideddstOffset. This method skips/ignores any unrecognized (ill-formed or incomplete) byte sequence.- Parameters:
srcString- the byte array containing the source UTF-8 encoded character sequence.srcOffset- the starting offset of the source character sequence insrcString.srcLength- the length (in bytes) of the source character sequence.dstString- the byte array for copying the resulting character sequence.dstOffset- the starting offset indstStringfor copying the resulting character sequence.- Returns:
- the number of bytes copied.
- Throws:
NullPointerException- ifsrcStringordstStringisnull.- See Also:
toLowerCase(byte[], short, short, byte[], short)
-
trim
public static short trim(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)Removes white space from both ends of the provided UTF-8 encoded character string and copies the resulting character sequence to the provided destination array, starting at the provideddstOffset.If the source string designated by
srcString,srcOffsetandsrcLengthrepresents an empty character sequence, or the first and last characters of character sequence of the source string both have codes greater than'\u0020'(the space character), then the source string is copied as is todstString, starting atdstOffset.Otherwise, if there is no character with a code greater than
'\u0020'in the source string, then no character is copied and 0 is returned.Otherwise, let k be the index of the first character in the source string whose code is greater than
'\u0020', and let m be the index of the last character in the source string whose code is greater than'\u0020'. The substring of the source string that begins with the character at index k and ends with the character at index m is copied todstString, starting atdstOffset.This method may be used to trim whitespace from the beginning and end of a string; in fact, it trims all ASCII control characters as well.
Illegal byte sequences are considered as non-white spaces.
- Parameters:
srcString- the byte array containing the source UTF-8 encoded character sequence.srcOffset- the starting offset of the source character sequence insrcString.srcLength- the length (in bytes) of the source character sequence.dstString- the byte array for copying the resulting character sequence.dstOffset- the starting offset indstStringfor copying the resulting character sequence.- Returns:
- the number of bytes copied.
- Throws:
NullPointerException- ifsrcStringordstStringisnull.
-
valueOf
public static short valueOf(boolean b, byte[] dstString, short dstOffset)Copies the UTF-8 encoded character string representation of thebooleanargument into the provided destination array, starting at the provided offset. If the argument istrue, a string equal to"true"is copied; otherwise, a string equal to"false"is copied.- Parameters:
b- aboolean.dstString- the destination UTF-8 encoded character string, as a byte arraydstOffset- the starting offset in the destination array- Returns:
- the number of bytes copied.
- Throws:
NullPointerException- ifdstStringisnull.
-
parseBoolean
public static boolean parseBoolean(byte[] aString, short offset, short length)Parses the string argument as a boolean. Thebooleanreturned represents the valuetrueif the string argument is notnulland is equal, ignoring case, to the string"true".- Parameters:
aString- the byte array containing the UTF-8 encoded character sequence to be parsed.offset- the starting offset of the character sequence inaString.length- the length (in bytes) of the character sequence to be parsed.- Returns:
- the boolean value represented by the designated character sequence.
- Throws:
NullPointerException- ifaStringisnull.
-
valueOf
public static short valueOf(short i, byte[] dstString, short dstOffset)Copies the UTF-8 encoded, signed decimal string representation of the the (up-to) 16 bits long signed (short) argument into the provided destination array, starting at the provided offset.- Parameters:
i- ashort.dstString- the destination UTF-8 encoded character string, as a byte arraydstOffset- the starting offset in the destination array- Returns:
- the number of bytes copied.
- Throws:
NullPointerException- ifdstStringisnull.
-
parseShortInteger
public static short parseShortInteger(byte[] aString, short offset, short length)Parses the provided UTF-8 encoded character sequence into the (up-to) 16 bits long signed (short) integer. Accepts decimal and hexadecimal numbers given by the following grammar:
DecimalNumeral and HexDigits are defined in §3.10.1 of the Java Language Specification.- DecodableString:
- Signopt DecimalNumeral
- Signopt
0xHexDigits- Signopt
0XHexDigits- Signopt
#HexDigits
- Signopt
- Sign:
-
The sequence of characters following an (optional) negative sign and/or radix specifier ("
0x", "0X", "#", or leading zero) is parsed as a (short) integer in the specified radix (10, or 16). This sequence of characters must represent a positive value or aStringExceptionwill be thrown with reasonStringException.ILLEGAL_NUMBER_FORMAT. The result is negated if first character of the specified character string is the minus sign. No whitespace characters are permitted in the character string.- Parameters:
aString- the byte array containing the UTF-8 encoded character sequence to be parsed.offset- the starting offset of the character sequence inaString.length- the length (in bytes) of the character sequence to be parsed.- Returns:
- the
shortinteger value represented by the designated character sequence. - Throws:
StringException- if the designated character sequence does not contain a parsable (short) integer.NullPointerException- ifaStringisnull.
-
valueOf
public static short valueOf(short[] l, byte[] dstString, short dstOffset)Copies the UTF-8 encoded, signed decimal string representation of the (up-to) 64 bits long signed integer argument provided as an array ofshortintegers, into the provided destination array, starting at the provided offset.- Parameters:
l- an array ofshortintegers representing up to a 64bits signed long integer; the most significantshortinteger is at index0.dstString- the destination UTF-8 encoded character string, as a byte arraydstOffset- the starting offset in the destination array- Returns:
- the number of bytes copied.
- Throws:
NullPointerException- iflordstStringisnull.
-
parseLongInteger
public static short parseLongInteger(byte[] aString, short offset, short length, short[] integer, short ioffset)Parses the provided UTF-8 encoded character sequence into a (up-to) 64 bits long signed integer. Accepts decimal and hexadecimal numbers given by the following grammar:
DecimalNumeral and HexDigits are defined in §3.10.1 of the Java Language Specification.- DecodableString:
- Signopt DecimalNumeral
- Signopt
0xHexDigits- Signopt
0XHexDigits- Signopt
#HexDigits
- Signopt
- Sign:
-
The sequence of characters following an (optional) negative sign and/or radix specifier ("
0x", "0X", "#", or leading zero) is parsed as a (long) integer in the specified radix (10 or 16). This sequence of characters must represent a positive value or aStringExceptionwill be thrown with reasonStringException.ILLEGAL_NUMBER_FORMAT. The result is negated if first character of the specified character string is the minus sign. No whitespace characters are permitted in the character string.- Parameters:
aString- the byte array containing the UTF-8 encoded character sequence to be parsed.offset- the starting offset of the character sequence inaString.length- the length (in bytes) of the character sequence to be parsed.integer- the array ofshortintegers to contained the value represented by the designated character sequence; the most significantshortinteger is at index0.ioffset- the starting offset inintegerfor copying the resulting short sequence.- Returns:
- the number of
shortintegers written into the array, ignoring leading zeroshort values. - Throws:
StringException- the designated character sequence does not contain a parsable (long) integer.NullPointerException- ifaStringorintegerisnull.
-
convertTo
public static short convertTo(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset, byte encoding)Converts to the specified character encoding all of the characters from the provided source UTF-8 encoded character string and copies them to the provided destination array, starting at the provideddstOffset.- Parameters:
srcString- the byte array containing the source UTF-8 encoded character sequence.srcOffset- the starting offset of the source character sequence insrcString.srcLength- the length (in bytes) of the source character sequence.dstString- the byte array for copying the resulting character sequence.dstOffset- the starting offset indstStringfor copying the resulting character sequence.encoding- the character encoding to be used.- Returns:
- the number of bytes copied.
- Throws:
StringException- with reasonStringException.UNSUPPORTED_ENCODINGif the requested character encoding is not supported.StringException- with reasonStringException.INVALID_BYTE_SEQUENCEif an invalid byte sequence is encountered.NullPointerException- ifsrcStringordstStringisnull.- See Also:
convertFrom(byte[], short, short, byte[], short, byte)
-
convertFrom
public static short convertFrom(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset, byte encoding)Converts from the specified character encoding to the UTF-8 character encoding all of the characters from the provided source character string and copies them to the provided destination array, starting at the provideddstOffset.- Parameters:
srcString- the byte array containing the source character sequence encoded in the character encoding designated byencoding.srcOffset- the starting offset of the source character sequence insrcString.srcLength- the length (in bytes) of the source character sequence.dstString- the byte array for copying the UTF-8 encoded resulting character sequence.dstOffset- the starting offset indstStringfor copying the resulting character sequence.encoding- the character encoding of the source character string.- Returns:
- the number of bytes copied.
- Throws:
StringException- with reasonStringException.UNSUPPORTED_ENCODINGif the requested character encoding is not supported.StringException- with reasonStringException.INVALID_BYTE_SEQUENCEif an invalid byte sequence is encountered.NullPointerException- ifsrcStringordstStringisnull.- See Also:
convertTo(byte[], short, short, byte[], short, byte)
-
check
public static boolean check(byte[] aString, short offset, short length)Checks if the provided byte array contains a valid UTF-8 encoded character or character sequence. As per UTF-8, a byte with a leading '0' bit is a single-byte code; a byte with leading '1' bits is the first byte of a multi-byte sequence whose length is equals to number of leading '1' bits; finally, a byte with a leading '10' bit sequence is a continuation byte of a multi-byte sequence.- Parameters:
aString- the byte array containing the UTF-8 encoded character sequence to be checked.offset- the starting offset of the character sequence insrcString.length- the length (in bytes) of the character sequence to be checked.- Returns:
- true, if the byte sequence corresponds to a valid UTF-8 encoded character or character sequence, false otherwise.
- Throws:
NullPointerException- ifaStringisnull.
-
startsWith
public static boolean startsWith(byte[] aString, short offset, short length, byte[] prefix, short poffset, short plength, short codePointCount)Tests if the UTF-8 encoded character sequence designated byaString,offsetandlengthstarts with the firstcodePointCountcharacters of the character sequence designated byprefix,poffsetandplength.If
codePointCountis negative, the whole prefix character sequence is considered; in which case calling this method is equivalent to callingarrayCompareas follows:return length >= plength && arrayCompare(aString, offset, prefix, poffset, plength) == 0;
Otherwise ifcodePointCountis positive, calling this method is equivalent to callingarrayCompareas follows:short endOffset = StringUtil.offsetByCodePoints(prefix, poffset, plength, 0, codePointCount); return length >= endOffset && arrayCompare(aString, offset, prefix, poffset, endOffset) == 0;
- Parameters:
aString- the byte array containing the reference UTF-8 encoded character sequence.offset- the starting offset of the reference character sequence inaString.length- the length (in bytes) of the reference character sequence.prefix- the byte array containing the prefixing UTF-8 encoded character sequence.poffset- the starting offset inprefixof the prefixing character sequence.plength- the length (in bytes) of the prefixing character sequence.codePointCount- the number of code points to be used for testing.- Returns:
trueif the character sequence designated byprefix,poffsetandplengthis a prefix of the character sequence designated byaString,offsetandlength;falseotherwise.- Throws:
NullPointerException- ifaStringorprefixisnull.
-
endsWith
public static boolean endsWith(byte[] aString, short offset, short length, byte[] suffix, short soffset, short slength, short codePointCount)Tests if the UTF-8 encoded character sequence designated byaString,offsetandlengthends with the firstcodePointCountcharacters of the character sequence designated bysuffix,soffsetandslength.If
codePointCountis negative, the whole suffix character sequence is considered; in which case calling this method is equivalent to callingarrayComapreas follows:return length >= slength && arrayCompare(aString, (short) (offset + length - slength), suffix, soffset, slength) == 0;Otherwise ifcodePointCountis positive, calling this method is equivalent to callingarrayCompareas follows:short endOffset = StringUtil.offsetByCodePoints(suffix, soffset, slength, 0, codePointCount); return length >= endOffset && arrayCompare(aString, (short) (offset + length - endOffset), suffix, soffset, endOffset) == 0;- Parameters:
aString- the byte array containing the reference UTF-8 encoded character sequence.offset- the starting offset of the reference character sequence inaString.length- the length (in bytes) of the reference character sequence.suffix- the byte array containing the suffixing UTF-8 encoded character sequence.soffset- the starting offset insuffixof the suffixing character sequence.slength- the length (in bytes) of the suffixing character sequence.codePointCount- the number of code points to be used for testing.- Returns:
trueif the character sequence designated bysuffix,soffsetandslengthis a suffix of the character sequence designated byaString,offsetandlength;falseotherwise.- Throws:
NullPointerException- ifaStringorsuffixisnull.
-
substring
public static short substring(byte[] srcString, short srcOffset, short srcLength, short codePointBeginIndex, short codePointEndIndex, byte[] dstString, short dstOffset)Copies to the destination byte array the specified substring of the designated source string. The substring begins at the specifiedcodePointbeginIndexand extends to the character at indexcodePointEndIndex - 1. Thus the length of the substring in codepoints (that is its codepoint count) iscodePointEndIndex - codePointbeginIndex.Ill-formed or incomplete byte sequences within the text range count as one code point each.
IfcodePointEndIndexis negative, then the whole remaining character sequence from the source string is considered.- Parameters:
srcString- the byte array containing the source UTF-8 encoded character sequence.srcOffset- the starting offset of the source character sequence insrcString.srcLength- the length (in bytes) of the source character sequence.codePointBeginIndex- the beginning index (relative tosrcOffset), inclusive.codePointEndIndex- the ending index (relative tosrcOffset), exclusive.dstString- the byte array for copying the resulting character sequence.dstOffset- the starting offset indstStringfor copying the resulting character sequence.- Returns:
- the number of bytes copied.
- Throws:
NullPointerException- ifsrcStringordstStringisnull.
-
-