public class LCSDetectionReader extends Reader
LCSDetectionReader class is the language and character detector (LCSD) reader class that transparently detects the character set and converts it to the Unicode data.
The most common usage is for the Reader interface to read the text data as follows:
InputStream in = file.getInputStream();
Reader rdr = new LCSDetectionReader(in);
char[] cbuf = new char[1024];
for (int len = -1; (len = rdr.read(cbuf)) != -1;)
{
// do something with cbuf
...
}
The detection occurs only once by sampling the first chunk of data.| Modifier and Type | Field and Description |
|---|---|
protected static int |
DEFAULT_SAMPLING_SIZE
Default sampling byte length for language and character set detection.
|
| Constructor and Description |
|---|
LCSDetectionReader(InputStream in)
Constructs the LCSD Reader instance with the character set determined by sampling initial data.
|
LCSDetectionReader(InputStream in, int size)
Constructs the LCSD Reader instance with the character set determined by sampling initial data.
|
LCSDetectionReader(Reader reader)
Constructs the LCSD Reader instance over the input stream reader.
|
LCSDetectionReader(String profile, InputStream in)
Constructs the LCSD Reader instance with the character set determined by sampling initial data.
|
LCSDetectionReader(String profile, InputStream in, int size)
Constructs the LCSD Reader instance with the character set determined by sampling initial data.
|
LCSDetectionReader(String profile, Reader reader)
Constructs the LCSD Reader instance over the reader.
|
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Closes the stream.
|
LCSDResultSet |
getResult()
Returns the result set of LCSD.
|
void |
mark(int readAheadLimit)
Marks the present position in the stream.
|
boolean |
markSupported()
Tells whether this stream supports the
mark() operation. |
int |
read()
Reads a single character.
|
int |
read(char[] cbuf)
Reads characters into an array.
|
int |
read(char[] cbuf, int offset, int length)
Reads characters into a portion of an array.
|
boolean |
ready()
Tells whether this stream is ready to be read.
|
void |
reset()
Resets the stream.
|
protected static final int DEFAULT_SAMPLING_SIZE
public LCSDetectionReader(InputStream in) throws IOException, UTFDataFormatException
in - the InputStream object including the text dataIOException - if any I/O error occursUTFDataFormatException - if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.public LCSDetectionReader(InputStream in, int size) throws IOException, UTFDataFormatException
in - the InputStream object including the text datasize - the sampling sizeIOException - if any I/O error occursUTFDataFormatException - if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is in UTF-8 encodingpublic LCSDetectionReader(String profile, InputStream in) throws IOException, UTFDataFormatException
profile - the LCSD profile name. null is the default.in - the InputStream object including the text dataIOException - if any I/O error occursUTFDataFormatException - if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is in UTF-8 encodingpublic LCSDetectionReader(String profile, InputStream in, int size) throws IOException, UTFDataFormatException
profile - the LCSD profile name. null is the defaultin - the InputStream object including the text datasize - the sampling sizeIOException - if any I/O error occursUTFDataFormatException - if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is in UTF-8 encodingpublic LCSDetectionReader(Reader reader) throws IOException
This constructor is used to detect the language from the reader object. The character set is always UTF-16.
reader - the InputStreamReader objectIOException - if any I/O error occurspublic LCSDetectionReader(String profile, Reader reader) throws IOException
This constructor is used to detect the language from the reader object. The character set is always UTF-16.
profile - the LCSD Profile name. null is the defaultreader - the reader including the text dataIOException - if any I/O error occurspublic LCSDResultSet getResult() throws IOException, UTFDataFormatException
If the language information is required in your application, call this method. The character set is implicitly used for the conversions, but if you need the name, call this method.
IOException - if any I/O error occursUTFDataFormatException - if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is in UTF-8 encodingpublic int read()
throws IOException,
UTFDataFormatException
read in class Reader-1 if the end of the stream has been reachedIOException - if an I/O error occursUTFDataFormatException - if any invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.public int read(char[] cbuf,
int offset,
int length)
throws IOException,
UTFDataFormatException
read in class Readercbuf - destination bufferoffset - offset at which to start storing characterslength - maximum number of characters to read-1 if the end of the stream has been reachedIOException - if I/O error occursUTFDataFormatException - if any invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.public void close()
throws IOException
close in interface Closeableclose in interface AutoCloseableclose in class ReaderIOException - if I/O error occurspublic boolean ready()
throws IOException
ready in class Readertrue if the next read() call is guaranteed not to block input, otherwise false is returnedIOException - if I/O error occurspublic boolean markSupported()
mark() operation.markSupported in class Readertrue if this stream supports the mark() operationpublic void mark(int readAheadLimit)
throws IOException
mark in class ReaderreadAheadLimit - limit on the number of characters that may be read while still preserving the mark. After reading the limited number of characters, attempting to reset the stream may fail.IOException - if the stream does not support the mark() operation, or if some other I/O error occurspublic void reset()
throws IOException
reset in class ReaderIOException - if the stream has not been marked, or if the mark has been invalidated, or if the stream does not support the reset() operation, or if some other I/O error occurspublic int read(char[] cbuf)
throws IOException,
UTFDataFormatException
read in class Readercbuf - destination buffer-1 if the end of the stream has been reachedIOException - if I/O error occursUTFDataFormatException - if any invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.