Class CharEntityReference


  • public class CharEntityReference
    extends Object
    The CharEntityReference class is a utility class to escape or unescape a string into a character reference or an entity reference form.

    Character reference refers to a specific character in the ISO/IEC 10646 character set in the following representation:

      CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'
      
    Entity reference refers to the content of the named entity of the form '&XXX;', where XXX represents the name of the entity. These names are defined in XML and HTML standards.

    In escape operations, the entities 'amp', lt', 'gt', 'apos', and 'quot' are used as NAMED entity references. DECIMAL or HEXADECIMAL character references are used for other characters.

    In unescape operations, 'amp', 'lt', 'gt', 'apos', 'quot', and all entity references defined in HTML 4.01 are recognized in order to convert back to the original characters.

    See Also:
    HTML 4.01 Specification, Extensible Markup Language (XML) 1.0 (Second Edition)
    • Method Detail

      • escape

        public static String escape​(String srcstr)
        Escapes a string into the character entity reference in the NAMED_DECIMAL_NUMBER form assuming the WE8ISO8859P1 character set.

        Default character set is WE8ISO8859P1, and the default form is the NAMED_DECIMAL_NUMBER form. Some characters are escaped in the named entity, for example, &lt; for '<'. Other characters are escaped in the hexadecimal form if they are not supported by the given character set.

        Parameters:
        srcstr - a string to be escaped
        Returns:
        an escaped string
        Throws:
        IllegalStateException - if WE8ISO8859P1 is not supported
        See Also:
        escape(String, String, CharEntityReference.Form)
      • escape

        public static String escape​(String srcstr,
                                    String dstCharset)
                             throws UnsupportedEncodingException
        Escapes a string into the character entity reference in the NAMED_DECIMAL_NUMBER form.

        The default form is the NAMED_DECIMAL_NUMBER form. Some characters are escaped in the named entity, for example, &lt; for '<'. Other characters are escaped in the hexadecimal form if they are not supported by the given character set.

        Parameters:
        srcstr - a string to be escaped
        dstCharset - an Oracle character set name
        Returns:
        an escaped string
        Throws:
        UnsupportedEncodingException - if the dstCharset is a invalid character set name
        See Also:
        escape(String, String, CharEntityReference.Form)
      • escape

        public static String escape​(String srcstr,
                                    String dstCharset,
                                    CharEntityReference.Form form)
                             throws UnsupportedEncodingException
        Escapes a string into the character entity reference form. Aside from characters not supported by the given character set in either the DECIMAL_NUMBER or HEXADECIMAL_NUMBER form, the following characters will be escaped in either the NAMED, DECIMAL_NUMBER, or HEXADECIMAL_NUMBER forms:
        • < - Less than sign (U+003C)
        • > - Greater than sign (U+003E)
        • & - Ampersand (U+0026)
        • ' - Apostrophe (U+0027)
        • " - Quotation mark (U+0022)
        Parameters:
        srcstr - a string to be escaped
        dstCharset - an Oracle character set name
        form - a form of character entity reference
        Returns:
        an escaped string
        Throws:
        UnsupportedEncodingException - if the dstCharset parameter is an invalid character set name
      • unescape

        public static String unescape​(String srcstr)
        Converts an escaped string into a Unicode string.
        Parameters:
        srcstr - a string containing escaped characters
        Returns:
        a string representing the original data
        Throws:
        IllegalArgumentException - if unregistered character entity reference is used