Package oracle.i18n.lcsd
Class LCSDetectionReader
- java.lang.Object
-
- java.io.Reader
-
- oracle.i18n.lcsd.LCSDetectionReader
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Readable
- Direct Known Subclasses:
LCSDetectionHTMLReader
public class LCSDetectionReader extends Reader
TheLCSDetectionReader
class is the language and character detector (LCSD) reader class that transparently detects the character set and converts it to the Unicode data.The most common usage is for the
Reader
interface to read the text data as follows:InputStream in = file.getInputStream(); Reader rdr = new LCSDetectionReader(in); char[] cbuf = new char[1024]; for (int len = -1; (len = rdr.read(cbuf)) != -1;) { // do something with cbuf ... }
The detection occurs only once by sampling the first chunk of data.- Since:
- 10.2
-
-
Field Summary
Fields Modifier and Type Field Description protected static int
DEFAULT_SAMPLING_SIZE
Default sampling byte length for language and character set detection.
-
Constructor Summary
Constructors Constructor Description LCSDetectionReader(InputStream in)
Constructs the LCSD Reader instance with the character set determined by sampling initial data.LCSDetectionReader(InputStream in, int size)
Constructs the LCSD Reader instance with the character set determined by sampling initial data.LCSDetectionReader(Reader reader)
Constructs the LCSD Reader instance over the input stream reader.LCSDetectionReader(String profile, InputStream in)
Constructs the LCSD Reader instance with the character set determined by sampling initial data.LCSDetectionReader(String profile, InputStream in, int size)
Constructs the LCSD Reader instance with the character set determined by sampling initial data.LCSDetectionReader(String profile, Reader reader)
Constructs the LCSD Reader instance over the reader.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Closes the stream.LCSDResultSet
getResult()
Returns the result set of LCSD.void
mark(int readAheadLimit)
Marks the present position in the stream.boolean
markSupported()
Tells whether this stream supports themark()
operation.int
read()
Reads a single character.int
read(char[] cbuf)
Reads characters into an array.int
read(char[] cbuf, int offset, int length)
Reads characters into a portion of an array.boolean
ready()
Tells whether this stream is ready to be read.void
reset()
Resets the stream.-
Methods inherited from class java.io.Reader
nullReader, read, skip, transferTo
-
-
-
-
Field Detail
-
DEFAULT_SAMPLING_SIZE
protected static final int DEFAULT_SAMPLING_SIZE
Default sampling byte length for language and character set detection.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
LCSDetectionReader
public LCSDetectionReader(InputStream in) throws IOException, UTFDataFormatException
Constructs the LCSD Reader instance with the character set determined by sampling initial data.- Parameters:
in
- theInputStream
object including the text data- Throws:
IOException
- if any I/O error occursUTFDataFormatException
- if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.
-
LCSDetectionReader
public LCSDetectionReader(InputStream in, int size) throws IOException, UTFDataFormatException
Constructs the LCSD Reader instance with the character set determined by sampling initial data.- Parameters:
in
- theInputStream
object including the text datasize
- the sampling size- Throws:
IOException
- if any I/O error occursUTFDataFormatException
- if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is in UTF-8 encoding
-
LCSDetectionReader
public LCSDetectionReader(String profile, InputStream in) throws IOException, UTFDataFormatException
Constructs the LCSD Reader instance with the character set determined by sampling initial data.- Parameters:
profile
- the LCSD profile name.null
is the default.in
- theInputStream
object including the text data- Throws:
IOException
- if any I/O error occursUTFDataFormatException
- if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is in UTF-8 encoding
-
LCSDetectionReader
public LCSDetectionReader(String profile, InputStream in, int size) throws IOException, UTFDataFormatException
Constructs the LCSD Reader instance with the character set determined by sampling initial data.- Parameters:
profile
- the LCSD profile name.null
is the defaultin
- theInputStream
object including the text datasize
- the sampling size- Throws:
IOException
- if any I/O error occursUTFDataFormatException
- if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is in UTF-8 encoding
-
LCSDetectionReader
public LCSDetectionReader(Reader reader) throws IOException
Constructs the LCSD Reader instance over the input stream reader.This constructor is used to detect the language from the reader object. The character set is always UTF-16.
- Parameters:
reader
- theInputStreamReader
object- Throws:
IOException
- if any I/O error occurs
-
LCSDetectionReader
public LCSDetectionReader(String profile, Reader reader) throws IOException
Constructs the LCSD Reader instance over the reader.This constructor is used to detect the language from the reader object. The character set is always UTF-16.
- Parameters:
profile
- the LCSD Profile name.null
is the defaultreader
- the reader including the text data- Throws:
IOException
- if any I/O error occurs
-
-
Method Detail
-
getResult
public LCSDResultSet getResult() throws IOException, UTFDataFormatException
Returns the result set of LCSD.If the language information is required in your application, call this method. The character set is implicitly used for the conversions, but if you need the name, call this method.
- Returns:
- the result set of LCSD
- Throws:
IOException
- if any I/O error occursUTFDataFormatException
- if an invalid UTF-8 data sequence is detected. Note this occurs only if the source is in UTF-8 encoding
-
read
public int read() throws IOException, UTFDataFormatException
Reads a single character.- Overrides:
read
in classReader
- Returns:
- the character read, or
-1
if the end of the stream has been reached - Throws:
IOException
- if an I/O error occursUTFDataFormatException
- if any invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.
-
read
public int read(char[] cbuf, int offset, int length) throws IOException, UTFDataFormatException
Reads characters into a portion of an array.- Specified by:
read
in classReader
- Parameters:
cbuf
- destination bufferoffset
- offset at which to start storing characterslength
- maximum number of characters to read- Returns:
- the number of characters read, or
-1
if the end of the stream has been reached - Throws:
IOException
- if I/O error occursUTFDataFormatException
- if any invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.
-
close
public void close() throws IOException
Closes the stream.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in classReader
- Throws:
IOException
- if I/O error occurs
-
ready
public boolean ready() throws IOException
Tells whether this stream is ready to be read.- Overrides:
ready
in classReader
- Returns:
true
if the nextread()
call is guaranteed not to block input, otherwisefalse
is returned- Throws:
IOException
- if I/O error occurs
-
markSupported
public boolean markSupported()
Tells whether this stream supports themark()
operation.- Overrides:
markSupported
in classReader
- Returns:
true
if this stream supports themark()
operation
-
mark
public void mark(int readAheadLimit) throws IOException
Marks the present position in the stream.- Overrides:
mark
in classReader
- Parameters:
readAheadLimit
- limit on the number of characters that may be read while still preserving the mark. After reading the limited number of characters, attempting to reset the stream may fail.- Throws:
IOException
- if the stream does not support themark()
operation, or if some other I/O error occurs
-
reset
public void reset() throws IOException
Resets the stream.- Overrides:
reset
in classReader
- Throws:
IOException
- if the stream has not been marked, or if the mark has been invalidated, or if the stream does not support thereset()
operation, or if some other I/O error occurs
-
read
public int read(char[] cbuf) throws IOException, UTFDataFormatException
Reads characters into an array.- Overrides:
read
in classReader
- Parameters:
cbuf
- destination buffer- Returns:
- the number of characters read, or
-1
if the end of the stream has been reached - Throws:
IOException
- if I/O error occursUTFDataFormatException
- if any invalid UTF-8 data sequence is detected. Note this occurs only if the source is UTF-8 data.
-
-