Class LCSDetectionHTMLInputStream

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public class LCSDetectionHTMLInputStream
    extends LCSDetectionInputStream
    The LCSDetectionHTMLInputStream class extends the LCSDetectionInputStream class to support the language/encoding detection for input in HTML format.

    The detection sampling length indicates how many bytes of plain text on which the detection feature will perform. The default sampling length is 1K. Generally, LCSD handles the language/encoding detection, and you do not need to set this value. By allowing changes to this value, you can control the detection sampling length.

    You can get the detection results from the LCSDResultSet class if needed.

    Any read method returns UTFDataFormatException if the source is UTF-8 data and an invalid UTF-8 sequence is found.

    Since:
    10.2
    • Constructor Detail

      • LCSDetectionHTMLInputStream

        public LCSDetectionHTMLInputStream​(InputStream in)
                                    throws IOException,
                                           UTFDataFormatException
        Creates an LCSDetectionHTMLInputStream object. Use the default sampling length and default profile for detection.
        Parameters:
        in - input stream that you want to detect
        Throws:
        IOException - if any I/O error occurs
        UTFDataFormatException - if any invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data
      • LCSDetectionHTMLInputStream

        public LCSDetectionHTMLInputStream​(String name,
                                           InputStream in)
                                    throws IOException,
                                           UTFDataFormatException
        Creates an LCSDetectionStream object with the specified profile for detection. Use the default sampling length.
        Parameters:
        name - the profile name
        in - input stream that you want to detect
        Throws:
        IOException - if any I/O error occurs
        UTFDataFormatException - if any invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data
      • LCSDetectionHTMLInputStream

        public LCSDetectionHTMLInputStream​(InputStream in,
                                           int len)
                                    throws IOException,
                                           UTFDataFormatException
        Creates an LCSDetectionStream object with the specified sampling length. Use the default profile for detection.
        Parameters:
        in - input stream that you want to detect
        len - the sampling length
        Throws:
        IOException - if any I/O error occurs
        UTFDataFormatException - if any invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data
      • LCSDetectionHTMLInputStream

        public LCSDetectionHTMLInputStream​(String name,
                                           InputStream in,
                                           int len)
                                    throws IOException,
                                           UTFDataFormatException
        Creates an LCSDetectionStream object with the specified sampling length and the specified profile for detection.
        Parameters:
        name - the profile name
        in - input stream that you want to detect
        len - the sampling length
        Throws:
        IOException - if any I/O error occurs
        UTFDataFormatException - if any invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data