Class StringUtil


  • public final class StringUtil
    extends Object
    This class provides methods for handling UTF-8 encoded character sequences (strings). This class also provides methods to convert UTF-8 encoded character strings to and from other character encodings, such as UCS-2, the GSM 7-bit character set and UTF-16. Support for character encodings other than UTF-8 is optional. Proprietary extensions to the supported character encodings must be identified starting from PROP_ENCODING_EXT. An implementation must throw a StringException with reason StringException.UNSUPPORTED_ENCODING if a requested character encoding is not supported.

    UTF-8 encodes each of the code points in the Unicode character set using one to four 8-bit bytes. Unicode code points are the numerical values that make up the Unicode code space. The Unicode Standard, version 4.0 is available from the Unicode Consortium at http://www.unicode.org. The UTF-8 transformation format is specified by RFC 3629.

    The encoded character sequences handled by this class are stored in byte arrays. Each string or character sequence is designated by a byte array, an offset in the byte array indicating the start of the character sequence and a length indicating the length in bytes of the character sequence. If a designated character sequence is outside the bounds of its containing array an ArrayIndexOutOfBoundsException is thrown (note: for readablity reasons, these exceptions are assumed and not systematically documented in the methods of this class).
    An index of a character or substring within a string is always relative to the begining of the string; that is relative to the offset of the string within the containing byte array. If a provided index of a character or substring within a designated character sequence is outside the bounds of the designated character sequence an IndexOutOfBoundsException is thrown.

    This class provides two categories of methods:

    Methods for dealing with plain byte sequences
    These methods do not check the (UTF-8) well-formedness of the byte sequences passed as parameters; in particular, they do not check whether the byte at a provided offset is the first byte of a valid UTF-8 byte sequence or if a provided length is equal or greater than the length of the UTF-8 byte sequence starting at the provided offset (as can be determined by its first byte). They only treat UTF-8 byte sequences as plain byte sequences for the purpose of comparison, copying, truncating... As an example of such methods, see the indexOf method.
    Methods for dealing with Unicode code points (i.e., Unicode characters)
    These methods check the well-formedness of UTF-8 byte sequences. They either throw a StringException with reason StringException.INVALID_BYTE_SEQUENCE when encountering an ill-formed UTF-8 byte sequence; as an example of such methods, see the convertTo method.
    Or they consider any byte sequence that is not well-formed (eg. incomplete) within a designated text range as a code point for the purpose of counting, comparing or returning; as an example of such methods, see the codePointCount method.
    To ensure that a character or character sequence is a valid UTF-8 character or character sequence, the check method should be used.

    Because Unicode case conversion may require locale-sensitive mappings, context-sensitive mappings, and 1:M character mappings, and in order to limit footprint case conversion supported by the methods toLowerCase, toUpperCase and compare is only available by default for the Basic Latin Unicode block (US-ASCII character set: U+0000 - U+007F). Other character blocks may be supported.

    Since:
    Java Card 3.0.4
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static byte GSM_7
      The GSM Septet character encoding.
      static byte ISO_8859_1
      The ISO 8859-1 (Latin-1) character encoding.
      static byte PROP_ENCODING_EXT
      Start of proprietary character encoding numbering.
      static byte UCS_2
      The UCS-2 character encoding.
      static byte UTF_16
      The UTF-16 character encoding.
      static byte UTF_16_BE
      The UTF-16BE (Big Endian) character encoding.
      static byte UTF_16_LE
      The UTF-16LE (Little Endian) character encoding.
      static byte UTF_8
      The UTF-8 character encoding.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static boolean check​(byte[] aString, short offset, short length)
      Checks if the provided byte array contains a valid UTF-8 encoded character or character sequence.
      static short codePointAt​(byte[] aString, short offset, short length, short index, byte[] dstBuffer, short dstOffset)
      Copies to the destination buffer the character (Unicode code point) at the specified index in the UTF-8 encoded character sequence designated by aString, offset and length.
      static short codePointBefore​(byte[] aString, short offset, short length, short index, byte[] dstBuffer, short dstOffset)
      Copies to the destination buffer the character (Unicode code point) before the specified index in the UTF-8 encoded character sequence designated by aString, offset and length.
      static short codePointCount​(byte[] aString, short offset, short length)
      Returns the number of characters (Unicode code points) in the UTF-8 encoded character sequence designated by aString, offset and length.
      static short compare​(boolean ignoreCase, byte[] aString, short offset, short length, byte[] anotherString, short ooffset, short olength)
      Compares two strings lexicographically, optionally ignoring case considerations.
      static short convertFrom​(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset, byte encoding)
      Converts from the specified character encoding to the UTF-8 character encoding all of the characters from the provided source character string and copies them to the provided destination array, starting at the provided dstOffset.
      static short convertTo​(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset, byte encoding)
      Converts to the specified character encoding all of the characters from the provided source UTF-8 encoded character string and copies them to the provided destination array, starting at the provided dstOffset.
      static boolean endsWith​(byte[] aString, short offset, short length, byte[] suffix, short soffset, short slength, short codePointCount)
      Tests if the UTF-8 encoded character sequence designated by aString, offset and length ends with the first codePointCount characters of the character sequence designated by suffix, soffset and slength.
      static short indexOf​(byte[] aString, short offset, short length, byte[] subString, short soffset, short slength)
      Returns the index within the provided UTF-8 encoded character string of the first occurrence of the specified substring.
      static short offsetByCodePoints​(byte[] aString, short offset, short length, short index, short codePointOffset)
      Returns the byte index within the UTF-8 encoded character sequence designated by aString, offset and length that is offset from the given index by codePointOffset code points.
      static boolean parseBoolean​(byte[] aString, short offset, short length)
      Parses the string argument as a boolean.
      static short parseLongInteger​(byte[] aString, short offset, short length, short[] integer, short ioffset)
      Parses the provided UTF-8 encoded character sequence into a (up-to) 64 bits long signed integer.
      static short parseShortInteger​(byte[] aString, short offset, short length)
      Parses the provided UTF-8 encoded character sequence into the (up-to) 16 bits long signed (short) integer.
      static short replace​(byte[] srcString, short srcOffset, short srcLength, byte[] oldSubstring, short oOffset, short oLength, byte[] newSubstring, short nOffset, short nLength, byte[] dstString, short dstOffset)
      Copies to the destination byte array the string resulting from replacing all occurrences of the old substring in the provided source string with the new substring.
      static boolean startsWith​(byte[] aString, short offset, short length, byte[] prefix, short poffset, short plength, short codePointCount)
      Tests if the UTF-8 encoded character sequence designated by aString, offset and length starts with the first codePointCount characters of the character sequence designated by prefix, poffset and plength.
      static short substring​(byte[] srcString, short srcOffset, short srcLength, short codePointBeginIndex, short codePointEndIndex, byte[] dstString, short dstOffset)
      Copies to the destination byte array the specified substring of the designated source string.
      static short toLowerCase​(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)
      Converts to lower case and copies all of the characters from the provided source UTF-8 encoded character string to the provided destination array, starting at the provided dstOffset.
      static short toUpperCase​(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)
      Converts to upper case and copies all of the characters from the provided source UTF-8 encoded character string to the provided destination array, starting at the provided dstOffset.
      static short trim​(byte[] srcString, short srcOffset, short srcLength, byte[] dstString, short dstOffset)
      Removes white space from both ends of the provided UTF-8 encoded character string and copies the resulting character sequence to the provided destination array, starting at the provided dstOffset.
      static short valueOf​(boolean b, byte[] dstString, short dstOffset)
      Copies the UTF-8 encoded character string representation of the boolean argument into the provided destination array, starting at the provided offset.
      static short valueOf​(short[] l, byte[] dstString, short dstOffset)
      Copies the UTF-8 encoded, signed decimal string representation of the (up-to) 64 bits long signed integer argument provided as an array of short integers, into the provided destination array, starting at the provided offset.
      static short valueOf​(short i, byte[] dstString, short dstOffset)
      Copies the UTF-8 encoded, signed decimal string representation of the the (up-to) 16 bits long signed (short) argument into the provided destination array, starting at the provided offset.
    • Field Detail

      • UTF_8

        public static final byte UTF_8
        The UTF-8 character encoding. This character encoding is used for internal representation and handling of character strings.
        See Also:
        Constant Field Values
      • UTF_16

        public static final byte UTF_16
        The UTF-16 character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.
        See Also:
        Constant Field Values
      • UTF_16_LE

        public static final byte UTF_16_LE
        The UTF-16LE (Little Endian) character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.
        See Also:
        Constant Field Values
      • UTF_16_BE

        public static final byte UTF_16_BE
        The UTF-16BE (Big Endian) character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.
        See Also:
        Constant Field Values
      • UCS_2

        public static final byte UCS_2
        The UCS-2 character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.
        See Also:
        Constant Field Values
      • GSM_7

        public static final byte GSM_7
        The GSM Septet character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.
        See Also:
        Constant Field Values
      • ISO_8859_1

        public static final byte ISO_8859_1
        The ISO 8859-1 (Latin-1) character encoding. This character encoding may be optionally supported for conversion to and from UTF-8 when doing input or output.
        See Also:
        Constant Field Values
      • PROP_ENCODING_EXT

        public static final byte PROP_ENCODING_EXT
        Start of proprietary character encoding numbering.
        See Also:
        Constant Field Values
    • Method Detail

      • codePointCount

        public static short codePointCount​(byte[] aString,
                                           short offset,
                                           short length)
        Returns the number of characters (Unicode code points) in the UTF-8 encoded character sequence designated by aString, offset and length. Ill-formed or incomplete byte sequences within the text range count as one code point each.
        Parameters:
        aString - the byte array containing the UTF-8 encoded character sequence.
        offset - the starting offset of the character sequence in the byte array.
        length - the length (in bytes) of the contained character sequence.
        Returns:
        the number of characters (Unicode code points) in the designated UTF-8 encoded character sequence.
        Throws:
        NullPointerException - if aString is null.
      • codePointAt

        public static short codePointAt​(byte[] aString,
                                        short offset,
                                        short length,
                                        short index,
                                        byte[] dstBuffer,
                                        short dstOffset)
        Copies to the destination buffer the character (Unicode code point) at the specified index in the UTF-8 encoded character sequence designated by aString, offset and length. index is an index in the byte array relative to the offset of the designated character sequence within the byte array (that is relative to offset). The resulting code point copied to the destination array is UTF-8 encoded and is therefore from one to four byte long. Ill-formed or incomplete byte sequences within the text range counting as one code point each, they are returned as-is.
        Parameters:
        aString - the byte array containing the UTF-8 encoded reference character sequence.
        offset - the starting offset of the reference character sequence in aString.
        length - the length (in bytes) of the reference character sequence.
        index - the byte index (relative to offset) of the character to be returned.
        dstBuffer - the byte array for copying the resulting character.
        dstOffset - the starting offset in dstBuffer for copying the UTF-8 byte sequence of the character (Unicode code point) at the specified index.
        Returns:
        the number of bytes copied.
        Throws:
        IndexOutOfBoundsException - if index is negative or not less than the length of aString.
        NullPointerException - if aString or dstBuffer is null.
      • codePointBefore

        public static short codePointBefore​(byte[] aString,
                                            short offset,
                                            short length,
                                            short index,
                                            byte[] dstBuffer,
                                            short dstOffset)
        Copies to the destination buffer the character (Unicode code point) before the specified index in the UTF-8 encoded character sequence designated by aString, offset and length. index is an index in the byte array relative to the offset of the designated character sequence within the byte array (that is relative to offset). The resulting code point copied to the destination array is UTF-8 encoded and is therefore from one to four byte long. Ill-formed or incomplete byte sequences within the text range counting as one code point each, they are returned as-is.
        Parameters:
        aString - the byte array containing the UTF-8 encoded character sequence.
        offset - the starting offset of the reference character sequence in aString.
        length - the length (in bytes) of the reference character sequence.
        index - the byte index (relative to offset) following the character to be returned.
        dstBuffer - the byte array for copying the resulting character.
        dstOffset - the starting offset in dstBuffer for copying the UTF-8 byte sequence of the character (Unicode code point) before the specified index.
        Returns:
        the number of bytes copied.
        Throws:
        IndexOutOfBoundsException - if index is less than 1 or greater than the length of aString.
        NullPointerException - if aString or dstBuffer is null.
      • offsetByCodePoints

        public static short offsetByCodePoints​(byte[] aString,
                                               short offset,
                                               short length,
                                               short index,
                                               short codePointOffset)
        Returns the byte index within the UTF-8 encoded character sequence designated by aString, offset and length that is offset from the given index by codePointOffset code points. Ill-formed or incomplete UTF-8 byte sequences within the text range given by index and codePointOffset count as one code point each.

        This method can be used to extract a substring from a string. For example, to copy to buffer the substring of the string aString that begins at codePointBeginIndex and extends to the character at index codePointEndIndex - 1, one can call:

         short beginOffset = StringUtil.offsetByCodePoints(aString, offset, length, (short) 0, codePointBeginIndex);
         short endOffset = StringUtil.offsetByCodePoints(aString, offset, length, (short) 0, codePointEndIndex);
         short l = Util.arrayCopy(aString, beginOffset, buffer, 0, (short) (endOffset - beginOffset));
         
        The copied substring thus has a length in codepoints (that is a codepoint count) equal to codePointEndIndex - codePointbeginIndex.

        Parameters:
        aString - the byte array containing the UTF-8 encoded character sequence.
        offset - the starting offset of the reference character sequence in aString.
        length - the length (in bytes) of the reference character sequence.
        index - the byte index to be offset (relative to offset).
        codePointOffset - the offset in code points.
        Returns:
        the index within aString relative to the begining of the string contained in aString that is, relative to offset.
        Throws:
        IndexOutOfBoundsException - if index is negative or larger than the length of aString, or if codePointOffset is positive and the substring starting with index has fewer than codePointOffset code points, or if codePointOffset is negative and the substring before index has fewer than the absolute value of codePointOffset code points.
        NullPointerException - if aString is null.
      • compare

        public static short compare​(boolean ignoreCase,
                                    byte[] aString,
                                    short offset,
                                    short length,
                                    byte[] anotherString,
                                    short ooffset,
                                    short olength)
        Compares two strings lexicographically, optionally ignoring case considerations. The comparison is based on the UTF-8-encoded Unicode value of each character in the strings. The character sequence designated by aString, offset and length is compared lexicographically to the character sequence designated by anotherString, ooffset and olength. The result is a negative number if the character sequence contained in aString lexicographically precedes the character sequence contained in anotherString. The result is a positive number if the character sequence contained in aString lexicographically follows the character sequence contained in anotherString. The result is zero if the two character sequences are equal.

        This is the definition of lexicographic ordering. If two strings are different, then either they have different characters at some index that is a valid index for both strings, or their lengths are different, or both. If they have different characters at one or more index positions, let k be the smallest such index; then the string whose character at position k has the smaller value, as determined by using the < operator, lexicographically precedes the other string. In this case, compare returns the difference of the first mismatching byte of the UTF-8 encode representation of the two character at position k in the two strings. If there is no index position at which they differ, then the shorter string lexicographically precedes the longer string. In this case, compare returns the difference of the lengths of the strings.

        When ignoring case considerations, this method behaves as if comparing (using the same algorithm as described above) normalized versions of the strings where case differences have been eliminated by calling toLowerCase(toUpperCase(string)) on both argument strings.

        Parameters:
        ignoreCase - whether case must be ignored.
        aString - the byte array containing the reference UTF-8 encoded character sequence.
        offset - the starting offset of the reference character sequence in aString.
        length - the length (in bytes) of the reference character sequence.
        anotherString - the byte array containing the UTF-8 encoded character sequence to be compared.
        ooffset - the starting offset in anotherString of the character sequence to be compared.
        olength - the length (in bytes) of the character sequence to be compared.
        Returns:
        the value 0 if the character sequence contained in anotherString is equal to the character sequence contained in aString; a value less than 0 if the character sequence contained in aString is lexicographically less than the character sequence contained in anotherString; and a value greater than 0 if the character sequence contained in aString is lexicographically greater than the character sequence contained in anotherString, optionally ignoring case considerations..
        Throws:
        NullPointerException - if aString or anotherString is null.
      • indexOf

        public static short indexOf​(byte[] aString,
                                    short offset,
                                    short length,
                                    byte[] subString,
                                    short soffset,
                                    short slength)
        Returns the index within the provided UTF-8 encoded character string of the first occurrence of the specified substring. The number returned is the smallest value k for which:
          compare(false, aString, offset + k, slength, substring, soffset, slength) == 0
         
        If no such value of k exists, then -1 is returned.
        Parameters:
        aString - the byte array containing the reference UTF-8 encoded character sequence.
        offset - the starting offset of the reference character sequence in aString.
        length - the length (in bytes) of the reference character sequence.
        subString - the byte array containing the UTF-8 encoded character sequence of the substring.
        soffset - the starting offset in subString of the substring's character sequence.
        slength - the length (in bytes) of the substring's character sequence.
        Returns:
        if the substring designated by subString, soffset and slength occurs as a substring within the string designated by aString, offset and length, then the index (relative to offset) of the first byte of the first such substring is returned; if it does not occur as a substring, -1 is returned.
        Throws:
        NullPointerException - if aString or subString is null.
      • replace

        public static short replace​(byte[] srcString,
                                    short srcOffset,
                                    short srcLength,
                                    byte[] oldSubstring,
                                    short oOffset,
                                    short oLength,
                                    byte[] newSubstring,
                                    short nOffset,
                                    short nLength,
                                    byte[] dstString,
                                    short dstOffset)
        Copies to the destination byte array the string resulting from replacing all occurrences of the old substring in the provided source string with the new substring.

        If the character sequence (substring) designated by oldSubstring, oOffset and oLength does not occur in the source character sequence designated by srcString, srcOffset and srcLength, then the source character sequence is copied as is to dstString, starting at dstOffset. Otherwise, a character sequence identical to the character sequence designated by srcString, srcOffset and srcLength is copied to dstString, starting at dstOffset, except that every occurrence of the substring designated by oldSubstring, oOffset and oLength is replaced by an occurrence of the substring designated by newSubstring, nOffset and nLength. The replacement proceeds from the beginning of the source string to the end.

        Parameters:
        srcString - the byte array containing the source UTF-8 encoded character sequence.
        srcOffset - the starting offset of the source character sequence in srcString.
        srcLength - the length (in bytes) of the source character sequence.
        oldSubstring - the byte array containing the UTF-8 encoded character sequence to be replaced.
        oOffset - the starting offset of the replaced character sequence in oldSubstring.
        oLength - the length (in bytes) of the replaced character sequence.
        newSubstring - the byte array containing the replacement UTF-8 encoded character sequence.
        nOffset - the starting offset of the replacement character sequence in newSubstring.
        nLength - the length (in bytes) of the replacement character sequence.
        dstString - the byte array for copying the resulting character sequence.
        dstOffset - the starting offset in dstString for copying the resulting character sequence.
        Returns:
        the number of bytes copied.
        Throws:
        NullPointerException - if srcString, oldSubstring, newSubstring or dstString is null.
      • toLowerCase

        public static short toLowerCase​(byte[] srcString,
                                        short srcOffset,
                                        short srcLength,
                                        byte[] dstString,
                                        short dstOffset)
        Converts to lower case and copies all of the characters from the provided source UTF-8 encoded character string to the provided destination array, starting at the provided dstOffset. This method skips/ignores any unrecognized (ill-formed or incomplete) byte sequence.
        Parameters:
        srcString - the byte array containing the source UTF-8 encoded character sequence.
        srcOffset - the starting offset of the source character sequence in srcString.
        srcLength - the length (in bytes) of the source character sequence.
        dstString - the byte array for copying the resulting character sequence.
        dstOffset - the starting offset in dstString for copying the resulting character sequence.
        Returns:
        the number of bytes copied.
        Throws:
        NullPointerException - if srcString or dstString is null.
        See Also:
        toUpperCase(byte[], short, short, byte[], short)
      • toUpperCase

        public static short toUpperCase​(byte[] srcString,
                                        short srcOffset,
                                        short srcLength,
                                        byte[] dstString,
                                        short dstOffset)
        Converts to upper case and copies all of the characters from the provided source UTF-8 encoded character string to the provided destination array, starting at the provided dstOffset. This method skips/ignores any unrecognized (ill-formed or incomplete) byte sequence.
        Parameters:
        srcString - the byte array containing the source UTF-8 encoded character sequence.
        srcOffset - the starting offset of the source character sequence in srcString.
        srcLength - the length (in bytes) of the source character sequence.
        dstString - the byte array for copying the resulting character sequence.
        dstOffset - the starting offset in dstString for copying the resulting character sequence.
        Returns:
        the number of bytes copied.
        Throws:
        NullPointerException - if srcString or dstString is null.
        See Also:
        toLowerCase(byte[], short, short, byte[], short)
      • trim

        public static short trim​(byte[] srcString,
                                 short srcOffset,
                                 short srcLength,
                                 byte[] dstString,
                                 short dstOffset)
        Removes white space from both ends of the provided UTF-8 encoded character string and copies the resulting character sequence to the provided destination array, starting at the provided dstOffset.

        If the source string designated by srcString, srcOffset and srcLength represents an empty character sequence, or the first and last characters of character sequence of the source string both have codes greater than '\u0020' (the space character), then the source string is copied as is to dstString, starting at dstOffset.

        Otherwise, if there is no character with a code greater than '\u0020' in the source string, then no character is copied and 0 is returned.

        Otherwise, let k be the index of the first character in the source string whose code is greater than '\u0020', and let m be the index of the last character in the source string whose code is greater than '\u0020'. The substring of the source string that begins with the character at index k and ends with the character at index m is copied to dstString, starting at dstOffset.

        This method may be used to trim whitespace from the beginning and end of a string; in fact, it trims all ASCII control characters as well.

        Illegal byte sequences are considered as non-white spaces.

        Parameters:
        srcString - the byte array containing the source UTF-8 encoded character sequence.
        srcOffset - the starting offset of the source character sequence in srcString.
        srcLength - the length (in bytes) of the source character sequence.
        dstString - the byte array for copying the resulting character sequence.
        dstOffset - the starting offset in dstString for copying the resulting character sequence.
        Returns:
        the number of bytes copied.
        Throws:
        NullPointerException - if srcString or dstString is null.
      • valueOf

        public static short valueOf​(boolean b,
                                    byte[] dstString,
                                    short dstOffset)
        Copies the UTF-8 encoded character string representation of the boolean argument into the provided destination array, starting at the provided offset. If the argument is true, a string equal to "true" is copied; otherwise, a string equal to "false" is copied.
        Parameters:
        b - a boolean.
        dstString - the destination UTF-8 encoded character string, as a byte array
        dstOffset - the starting offset in the destination array
        Returns:
        the number of bytes copied.
        Throws:
        NullPointerException - if dstString is null.
      • parseBoolean

        public static boolean parseBoolean​(byte[] aString,
                                           short offset,
                                           short length)
        Parses the string argument as a boolean. The boolean returned represents the value true if the string argument is not null and is equal, ignoring case, to the string "true".
        Parameters:
        aString - the byte array containing the UTF-8 encoded character sequence to be parsed.
        offset - the starting offset of the character sequence in aString.
        length - the length (in bytes) of the character sequence to be parsed.
        Returns:
        the boolean value represented by the designated character sequence.
        Throws:
        NullPointerException - if aString is null.
      • valueOf

        public static short valueOf​(short i,
                                    byte[] dstString,
                                    short dstOffset)
        Copies the UTF-8 encoded, signed decimal string representation of the the (up-to) 16 bits long signed (short) argument into the provided destination array, starting at the provided offset.
        Parameters:
        i - a short.
        dstString - the destination UTF-8 encoded character string, as a byte array
        dstOffset - the starting offset in the destination array
        Returns:
        the number of bytes copied.
        Throws:
        NullPointerException - if dstString is null.
      • parseShortInteger

        public static short parseShortInteger​(byte[] aString,
                                              short offset,
                                              short length)
        Parses the provided UTF-8 encoded character sequence into the (up-to) 16 bits long signed (short) integer. Accepts decimal and hexadecimal numbers given by the following grammar:
        DecodableString:
        Signopt DecimalNumeral
        Signopt 0x HexDigits
        Signopt 0X HexDigits
        Signopt # HexDigits

        Sign:
        -
        DecimalNumeral and HexDigits are defined in §3.10.1 of the Java Language Specification.

        The sequence of characters following an (optional) negative sign and/or radix specifier ("0x", "0X", "#", or leading zero) is parsed as a (short) integer in the specified radix (10, or 16). This sequence of characters must represent a positive value or a StringException will be thrown with reason StringException.ILLEGAL_NUMBER_FORMAT. The result is negated if first character of the specified character string is the minus sign. No whitespace characters are permitted in the character string.

        Parameters:
        aString - the byte array containing the UTF-8 encoded character sequence to be parsed.
        offset - the starting offset of the character sequence in aString.
        length - the length (in bytes) of the character sequence to be parsed.
        Returns:
        the short integer value represented by the designated character sequence.
        Throws:
        StringException - if the designated character sequence does not contain a parsable (short) integer.
        NullPointerException - if aString is null.
      • valueOf

        public static short valueOf​(short[] l,
                                    byte[] dstString,
                                    short dstOffset)
        Copies the UTF-8 encoded, signed decimal string representation of the (up-to) 64 bits long signed integer argument provided as an array of short integers, into the provided destination array, starting at the provided offset.
        Parameters:
        l - an array of short integers representing up to a 64bits signed long integer; the most significant short integer is at index 0.
        dstString - the destination UTF-8 encoded character string, as a byte array
        dstOffset - the starting offset in the destination array
        Returns:
        the number of bytes copied.
        Throws:
        NullPointerException - if l or dstString is null.
      • parseLongInteger

        public static short parseLongInteger​(byte[] aString,
                                             short offset,
                                             short length,
                                             short[] integer,
                                             short ioffset)
        Parses the provided UTF-8 encoded character sequence into a (up-to) 64 bits long signed integer. Accepts decimal and hexadecimal numbers given by the following grammar:
        DecodableString:
        Signopt DecimalNumeral
        Signopt 0x HexDigits
        Signopt 0X HexDigits
        Signopt # HexDigits

        Sign:
        -
        DecimalNumeral and HexDigits are defined in §3.10.1 of the Java Language Specification.

        The sequence of characters following an (optional) negative sign and/or radix specifier ("0x", "0X", "#", or leading zero) is parsed as a (long) integer in the specified radix (10 or 16). This sequence of characters must represent a positive value or a StringException will be thrown with reason StringException.ILLEGAL_NUMBER_FORMAT. The result is negated if first character of the specified character string is the minus sign. No whitespace characters are permitted in the character string.

        Parameters:
        aString - the byte array containing the UTF-8 encoded character sequence to be parsed.
        offset - the starting offset of the character sequence in aString.
        length - the length (in bytes) of the character sequence to be parsed.
        integer - the array of short integers to contained the value represented by the designated character sequence; the most significant short integer is at index 0.
        ioffset - the starting offset in integer for copying the resulting short sequence.
        Returns:
        the number of short integers written into the array, ignoring leading zeroshort values.
        Throws:
        StringException - the designated character sequence does not contain a parsable (long) integer.
        NullPointerException - if aString or integer is null.
      • convertTo

        public static short convertTo​(byte[] srcString,
                                      short srcOffset,
                                      short srcLength,
                                      byte[] dstString,
                                      short dstOffset,
                                      byte encoding)
        Converts to the specified character encoding all of the characters from the provided source UTF-8 encoded character string and copies them to the provided destination array, starting at the provided dstOffset.
        Parameters:
        srcString - the byte array containing the source UTF-8 encoded character sequence.
        srcOffset - the starting offset of the source character sequence in srcString.
        srcLength - the length (in bytes) of the source character sequence.
        dstString - the byte array for copying the resulting character sequence.
        dstOffset - the starting offset in dstString for copying the resulting character sequence.
        encoding - the character encoding to be used.
        Returns:
        the number of bytes copied.
        Throws:
        StringException - with reason StringException.UNSUPPORTED_ENCODING if the requested character encoding is not supported.
        StringException - with reason StringException.INVALID_BYTE_SEQUENCE if an invalid byte sequence is encountered.
        NullPointerException - if srcString or dstString is null.
        See Also:
        convertFrom(byte[], short, short, byte[], short, byte)
      • convertFrom

        public static short convertFrom​(byte[] srcString,
                                        short srcOffset,
                                        short srcLength,
                                        byte[] dstString,
                                        short dstOffset,
                                        byte encoding)
        Converts from the specified character encoding to the UTF-8 character encoding all of the characters from the provided source character string and copies them to the provided destination array, starting at the provided dstOffset.
        Parameters:
        srcString - the byte array containing the source character sequence encoded in the character encoding designated by encoding.
        srcOffset - the starting offset of the source character sequence in srcString.
        srcLength - the length (in bytes) of the source character sequence.
        dstString - the byte array for copying the UTF-8 encoded resulting character sequence.
        dstOffset - the starting offset in dstString for copying the resulting character sequence.
        encoding - the character encoding of the source character string.
        Returns:
        the number of bytes copied.
        Throws:
        StringException - with reason StringException.UNSUPPORTED_ENCODING if the requested character encoding is not supported.
        StringException - with reason StringException.INVALID_BYTE_SEQUENCE if an invalid byte sequence is encountered.
        NullPointerException - if srcString or dstString is null.
        See Also:
        convertTo(byte[], short, short, byte[], short, byte)
      • check

        public static boolean check​(byte[] aString,
                                    short offset,
                                    short length)
        Checks if the provided byte array contains a valid UTF-8 encoded character or character sequence. As per UTF-8, a byte with a leading '0' bit is a single-byte code; a byte with leading '1' bits is the first byte of a multi-byte sequence whose length is equals to number of leading '1' bits; finally, a byte with a leading '10' bit sequence is a continuation byte of a multi-byte sequence.
        Parameters:
        aString - the byte array containing the UTF-8 encoded character sequence to be checked.
        offset - the starting offset of the character sequence in srcString.
        length - the length (in bytes) of the character sequence to be checked.
        Returns:
        true, if the byte sequence corresponds to a valid UTF-8 encoded character or character sequence, false otherwise.
        Throws:
        NullPointerException - if aString is null.
      • startsWith

        public static boolean startsWith​(byte[] aString,
                                         short offset,
                                         short length,
                                         byte[] prefix,
                                         short poffset,
                                         short plength,
                                         short codePointCount)
        Tests if the UTF-8 encoded character sequence designated by aString, offset and length starts with the first codePointCount characters of the character sequence designated by prefix, poffset and plength.

        If codePointCount is negative, the whole prefix character sequence is considered; in which case calling this method is equivalent to calling arrayCompare as follows:

         return length >= plength && arrayCompare(aString, offset, prefix, poffset, plength) == 0;
         
        Otherwise if codePointCount is positive, calling this method is equivalent to calling arrayCompare as follows:
         short endOffset = StringUtil.offsetByCodePoints(prefix, poffset, plength, 0, codePointCount);
         return length >= endOffset && arrayCompare(aString, offset, prefix, poffset, endOffset) == 0;
         

        Parameters:
        aString - the byte array containing the reference UTF-8 encoded character sequence.
        offset - the starting offset of the reference character sequence in aString.
        length - the length (in bytes) of the reference character sequence.
        prefix - the byte array containing the prefixing UTF-8 encoded character sequence.
        poffset - the starting offset in prefix of the prefixing character sequence.
        plength - the length (in bytes) of the prefixing character sequence.
        codePointCount - the number of code points to be used for testing.
        Returns:
        true if the character sequence designated by prefix, poffset and plength is a prefix of the character sequence designated by aString, offset and length; false otherwise.
        Throws:
        NullPointerException - if aString or prefix is null.
      • endsWith

        public static boolean endsWith​(byte[] aString,
                                       short offset,
                                       short length,
                                       byte[] suffix,
                                       short soffset,
                                       short slength,
                                       short codePointCount)
        Tests if the UTF-8 encoded character sequence designated by aString, offset and length ends with the first codePointCount characters of the character sequence designated by suffix, soffset and slength.

        If codePointCount is negative, the whole suffix character sequence is considered; in which case calling this method is equivalent to calling arrayComapre as follows:

         return length >= slength && arrayCompare(aString,
                 (short) (offset + length - slength), suffix, soffset, slength) == 0;
         
        Otherwise if codePointCount is positive, calling this method is equivalent to calling arrayCompare as follows:
         short endOffset = StringUtil.offsetByCodePoints(suffix, soffset, slength, 0, codePointCount);
         return length >= endOffset && arrayCompare(aString,
                 (short) (offset + length - endOffset), suffix, soffset, endOffset) == 0;
         

        Parameters:
        aString - the byte array containing the reference UTF-8 encoded character sequence.
        offset - the starting offset of the reference character sequence in aString.
        length - the length (in bytes) of the reference character sequence.
        suffix - the byte array containing the suffixing UTF-8 encoded character sequence.
        soffset - the starting offset in suffix of the suffixing character sequence.
        slength - the length (in bytes) of the suffixing character sequence.
        codePointCount - the number of code points to be used for testing.
        Returns:
        true if the character sequence designated by suffix, soffset and slength is a suffix of the character sequence designated by aString, offset and length; false otherwise.
        Throws:
        NullPointerException - if aString or suffix is null.
      • substring

        public static short substring​(byte[] srcString,
                                      short srcOffset,
                                      short srcLength,
                                      short codePointBeginIndex,
                                      short codePointEndIndex,
                                      byte[] dstString,
                                      short dstOffset)
        Copies to the destination byte array the specified substring of the designated source string. The substring begins at the specified codePointbeginIndex and extends to the character at index codePointEndIndex - 1. Thus the length of the substring in codepoints (that is its codepoint count) is codePointEndIndex - codePointbeginIndex.

        Ill-formed or incomplete byte sequences within the text range count as one code point each.
        If codePointEndIndex is negative, then the whole remaining character sequence from the source string is considered.

        Parameters:
        srcString - the byte array containing the source UTF-8 encoded character sequence.
        srcOffset - the starting offset of the source character sequence in srcString.
        srcLength - the length (in bytes) of the source character sequence.
        codePointBeginIndex - the beginning index (relative to srcOffset), inclusive.
        codePointEndIndex - the ending index (relative to srcOffset), exclusive.
        dstString - the byte array for copying the resulting character sequence.
        dstOffset - the starting offset in dstString for copying the resulting character sequence.
        Returns:
        the number of bytes copied.
        Throws:
        NullPointerException - if srcString or dstString is null.