Class LCSDetectionInputStream

  • All Implemented Interfaces:
    Closeable, AutoCloseable
    Direct Known Subclasses:
    LCSDetectionHTMLInputStream

    public class LCSDetectionInputStream
    extends FilterInputStream
    The LCSDetectionInputStream class is the language and character detector stream class that transparently detects the language and character set for the stream object.

    The output of read methods are the original contents of the input stream in its original state. If the character set conversion using the detected character set is needed, use the LCSDetectionReader class. If the input stream includes the HTML contents, use the LCSDetectionHTMLInputStream class or the LCSDetectionHTMLReader class. The LCSDetectionInputStream class is used to determine the language and character set information used in the content.

    Since:
    10.2
    • Field Detail

      • DEFAULT_SAMPLING_SIZE

        protected static final int DEFAULT_SAMPLING_SIZE
        Default buffer size for the dual stream objects.
        See Also:
        Constant Field Values
    • Constructor Detail

      • LCSDetectionInputStream

        public LCSDetectionInputStream​(InputStream in)
                                throws IOException,
                                       UTFDataFormatException
        Constructs the default LCS Detector input stream object.
        Parameters:
        in - the input stream to be sampled by the detector
        Throws:
        IOException - if any I/O error occurs
        UTFDataFormatException - if an UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.
      • LCSDetectionInputStream

        public LCSDetectionInputStream​(InputStream in,
                                       int size)
                                throws IOException,
                                       UTFDataFormatException
        Constructs the default LCS Detector input stream object with the sampling length.
        Parameters:
        in - the input stream to be sampled by the detector
        size - the sampling size
        Throws:
        IOException - if any I/O error occurs
        UTFDataFormatException - if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.
      • LCSDetectionInputStream

        public LCSDetectionInputStream​(String profile,
                                       InputStream in)
                                throws IOException,
                                       UTFDataFormatException
        Constructs the LCS Detector input stream object using the specified LCSD profile.
        Parameters:
        profile - the LCSD profile name. Specify null for the default profile.
        in - the input stream to be sampled by the detector
        Throws:
        IOException - if any I/O error occurs
        UTFDataFormatException - if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.
      • LCSDetectionInputStream

        public LCSDetectionInputStream​(String profile,
                                       InputStream in,
                                       int size)
                                throws IOException,
                                       UTFDataFormatException
        Constructs the LCS Detector input stream object using the specified LCSD profile, input stream object, and sampling length.

        The sampling is performed for the specified length. The detection result does not change until the read operation exceeds the specified length. Each minimum data length will be sampled afterwards.

        Parameters:
        profile - the LCSD profile name. Specify null for the default profile.
        in - the input stream to be sampled by the detector
        size - the sampling length
        Throws:
        IOException - if any I/O error occurs
        UTFDataFormatException - if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.
    • Method Detail

      • read

        public int read​(byte[] b)
                 throws IOException,
                        UTFDataFormatException
        Reads some number of bytes from the input stream and stores them into the buffer array b.
        Overrides:
        read in class FilterInputStream
        Parameters:
        b - the buffer into which the data is read
        Returns:
        The total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.
        Throws:
        IOException - if an I/O error occurs
        UTFDataFormatException - if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.
      • read

        public int read​(byte[] b,
                        int off,
                        int len)
                 throws IOException,
                        UTFDataFormatException
        Reads up to len bytes of data from the input stream into an array of bytes.
        Overrides:
        read in class FilterInputStream
        Parameters:
        b - the buffer into which the data is read
        off - the start offset in array b to where the data is written
        len - the maximum number of bytes to read
        Returns:
        The total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.
        Throws:
        IOException - if an I/O error occurs
        UTFDataFormatException - if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.