Class OraNormalizer


  • public class OraNormalizer
    extends Object
    The OraNormalizer class is a class used for normalizing strings following the Unicode Standard. Unicode characters may have different canonical formats. Before you can accurately compare characters, you must call the methods in this class to make sure that they are in the same canonical format. For more information regarding Unicode Normalization and the Unicode Standard, please refer to Technical Report 15 at www.unicode.org.
    • Field Detail

      • NO_DECOMP

        public static final int NO_DECOMP
        Canonical sort the string.
        See Also:
        Constant Field Values
      • CANONICAL_DECOMP

        public static final int CANONICAL_DECOMP
        Canonical decompose the string and then canonical sort the string.
        See Also:
        Constant Field Values
      • COMPATIBILITY_DECOMP

        public static final int COMPATIBILITY_DECOMP
        Compatible decompose the string and then canonical sort the string.
        See Also:
        Constant Field Values
      • NFC

        public static final int NFC
        Canonical decomposition then composition.
        See Also:
        Constant Field Values
      • NFKC

        public static final int NFKC
        Compatibility decomposition then composition canonical sort the string.
        See Also:
        Constant Field Values
    • Method Detail

      • getInstance

        public static OraNormalizer getInstance()
        Returns a shared OraNormalizer instance to the user.
        Returns:
        an OraNormalizer instance
      • canonicalSort

        public void canonicalSort​(char[] sequence)
        This method takes a char[] and performs canonical sort on the sequence. The character string before and after this operation is considered equivalent by the Unicode Standard.
        Parameters:
        sequence - a string to sort
      • compose

        public String compose​(String sequence)
        This is the normalization method. It normalizes by composing the Unicode characters after which it performs a canonical sort on the string.
        Parameters:
        sequence - a string to compose
        Returns:
        the composed string
      • decompose

        public String decompose​(String sequence,
                                int mode)
        This is the normalization method. It normalizes by decomposing the Unicode characters after which it performs a canonical sort on the string.
        Parameters:
        sequence - a string to compose
        mode - decomposed mode; options are NO_DECOMPOSITION, CANONICAL_DECOMPOSITION, and COMPATIBLE_DECOMPOSITION
        Returns:
        the decomposed string
      • normalize

        public String normalize​(String sequence,
                                int mode)
        This is the normalization method. It normalizes by decomposing the Unicode character after which it performs a canonical sort on the string.
        Parameters:
        sequence - a string to compose
        mode - decomposed mode; options are NFD, NFC, NFKD, NFKC
        Returns:
        the normalized string