Package oracle.i18n.lcsd
Class LCSDetectionReader
- java.lang.Object
-
- java.io.Reader
-
- oracle.i18n.lcsd.LCSDetectionReader
-
- All Implemented Interfaces:
Closeable,AutoCloseable,Readable
- Direct Known Subclasses:
LCSDetectionHTMLReader
public class LCSDetectionReader extends Reader
TheLCSDetectionReaderclass is the language and character detector (LCSD) reader class that transparently detects the character set and converts it to the Unicode data.The most common usage is for the
Readerinterface to read the text data as follows:InputStream in = file.getInputStream(); Reader rdr = new LCSDetectionReader(in); char[] cbuf = new char[1024]; for (int len = -1; (len = rdr.read(cbuf)) != -1;) { // do something with cbuf ... }The detection occurs only once by sampling the first chunk of data.- Since:
- 10.2
-
-
Field Summary
Fields Modifier and Type Field Description protected static intDEFAULT_SAMPLING_SIZEDefault sampling byte length for language and character set detection.
-
Constructor Summary
Constructors Constructor Description LCSDetectionReader(InputStream in)Constructs the LCSD Reader instance with the character set determined by sampling initial data.LCSDetectionReader(InputStream in, int size)Constructs the LCSD Reader instance with the character set determined by sampling initial data.LCSDetectionReader(Reader reader)Constructs the LCSD Reader instance over the input stream reader.LCSDetectionReader(String profile, InputStream in)Constructs the LCSD Reader instance with the character set determined by sampling initial data.LCSDetectionReader(String profile, InputStream in, int size)Constructs the LCSD Reader instance with the character set determined by sampling initial data.LCSDetectionReader(String profile, Reader reader)Constructs the LCSD Reader instance over the reader.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()Closes the stream.LCSDResultSetgetResult()Returns the result set of LCSD.voidmark(int readAheadLimit)Marks the present position in the stream.booleanmarkSupported()Tells whether this stream supports themark()operation.intread()Reads a single character.intread(char[] cbuf)Reads characters into an array.intread(char[] cbuf, int offset, int length)Reads characters into a portion of an array.booleanready()Tells whether this stream is ready to be read.voidreset()Resets the stream.-
Methods inherited from class java.io.Reader
nullReader, read, skip, transferTo
-
-
-
-
Field Detail
-
DEFAULT_SAMPLING_SIZE
protected static final int DEFAULT_SAMPLING_SIZE
Default sampling byte length for language and character set detection.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
LCSDetectionReader
public LCSDetectionReader(InputStream in) throws IOException, UTFDataFormatException
Constructs the LCSD Reader instance with the character set determined by sampling initial data.- Parameters:
in- theInputStreamobject including the text data- Throws:
IOException- if any I/O error occursUTFDataFormatException- if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.
-
LCSDetectionReader
public LCSDetectionReader(InputStream in, int size) throws IOException, UTFDataFormatException
Constructs the LCSD Reader instance with the character set determined by sampling initial data.- Parameters:
in- theInputStreamobject including the text datasize- the sampling size- Throws:
IOException- if any I/O error occursUTFDataFormatException- if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is in UTF-8 encoding
-
LCSDetectionReader
public LCSDetectionReader(String profile, InputStream in) throws IOException, UTFDataFormatException
Constructs the LCSD Reader instance with the character set determined by sampling initial data.- Parameters:
profile- the LCSD profile name.nullis the default.in- theInputStreamobject including the text data- Throws:
IOException- if any I/O error occursUTFDataFormatException- if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is in UTF-8 encoding
-
LCSDetectionReader
public LCSDetectionReader(String profile, InputStream in, int size) throws IOException, UTFDataFormatException
Constructs the LCSD Reader instance with the character set determined by sampling initial data.- Parameters:
profile- the LCSD profile name.nullis the defaultin- theInputStreamobject including the text datasize- the sampling size- Throws:
IOException- if any I/O error occursUTFDataFormatException- if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is in UTF-8 encoding
-
LCSDetectionReader
public LCSDetectionReader(Reader reader) throws IOException
Constructs the LCSD Reader instance over the input stream reader.This constructor is used to detect the language from the reader object. The character set is always UTF-16.
- Parameters:
reader- theInputStreamReaderobject- Throws:
IOException- if any I/O error occurs
-
LCSDetectionReader
public LCSDetectionReader(String profile, Reader reader) throws IOException
Constructs the LCSD Reader instance over the reader.This constructor is used to detect the language from the reader object. The character set is always UTF-16.
- Parameters:
profile- the LCSD Profile name.nullis the defaultreader- the reader including the text data- Throws:
IOException- if any I/O error occurs
-
-
Method Detail
-
getResult
public LCSDResultSet getResult() throws IOException, UTFDataFormatException
Returns the result set of LCSD.If the language information is required in your application, call this method. The character set is implicitly used for the conversions, but if you need the name, call this method.
- Returns:
- the result set of LCSD
- Throws:
IOException- if any I/O error occursUTFDataFormatException- if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is in UTF-8 encoding
-
read
public int read() throws IOException, UTFDataFormatExceptionReads a single character.- Overrides:
readin classReader- Returns:
- the character read, or
-1if the end of the stream has been reached - Throws:
IOException- if an I/O error occursUTFDataFormatException- if any invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.
-
read
public int read(char[] cbuf, int offset, int length) throws IOException, UTFDataFormatExceptionReads characters into a portion of an array.- Specified by:
readin classReader- Parameters:
cbuf- destination bufferoffset- offset at which to start storing characterslength- maximum number of characters to read- Returns:
- the number of characters read, or
-1if the end of the stream has been reached - Throws:
IOException- if I/O error occursUTFDataFormatException- if any invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.
-
close
public void close() throws IOExceptionCloses the stream.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein classReader- Throws:
IOException- if I/O error occurs
-
ready
public boolean ready() throws IOExceptionTells whether this stream is ready to be read.- Overrides:
readyin classReader- Returns:
trueif the nextread()call is guaranteed not to block input, otherwisefalseis returned- Throws:
IOException- if I/O error occurs
-
markSupported
public boolean markSupported()
Tells whether this stream supports themark()operation.- Overrides:
markSupportedin classReader- Returns:
trueif this stream supports themark()operation
-
mark
public void mark(int readAheadLimit) throws IOExceptionMarks the present position in the stream.- Overrides:
markin classReader- Parameters:
readAheadLimit- limit on the number of characters that may be read while still preserving the mark. After reading the limited number of characters, attempting to reset the stream may fail.- Throws:
IOException- if the stream does not support themark()operation, or if some other I/O error occurs
-
reset
public void reset() throws IOExceptionResets the stream.- Overrides:
resetin classReader- Throws:
IOException- if the stream has not been marked, or if the mark has been invalidated, or if the stream does not support thereset()operation, or if some other I/O error occurs
-
read
public int read(char[] cbuf) throws IOException, UTFDataFormatExceptionReads characters into an array.- Overrides:
readin classReader- Parameters:
cbuf- destination buffer- Returns:
- the number of characters read, or
-1if the end of the stream has been reached - Throws:
IOException- if I/O error occursUTFDataFormatException- if any invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.
-
-