Class LCSDetector
- java.lang.Object
-
- oracle.i18n.lcsd.LCSDetector
-
public class LCSDetector extends Object
TheLCSDetector
class contains methods to automatically detect and recognize language, encoding, or both based on text input.To use the
LCSDetector
class, call thegetInstance()
method to obtain an instance of theLCSDetector
class. You can specify a profile by calling thegetInstance(profile)
method, or simply call thegetInstance()
method to use the standard profile depending on the content of the text you plan to sample. Certain profiles may yield more accurate results. For example, if you are sampling medical journals, you many want to use a profile that is built using mainly medical journals. If you are sampling computer related white papers, a profile built with similar documents improves the accuracy of the detection. Currently, we only provide one standard profile which is for general purpose detection.The detection process begins by calling the
detect(byte[])
method. Statistics are cumulated every time adetect(byte[])
method is called. When you are ready for the result, call thegetResult()
method to retrieve anLCSDResultSet
instance. To begin a new detection using the sameLCSDetector
instance, call thereset()
method to remove the cumulated statistics.- Since:
- 10.1.0.2
- See Also:
LCSDResultSet
-
-
Constructor Summary
Constructors Constructor Description LCSDetector()
Constructor which uses the standard default profile.LCSDetector(String name)
Constructor which takes a profile name and allows you to choose a profile other than the default.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
detect(byte[] input)
Statistical data is cumulated in an internal structure when thedetect()
methods are called.int
detect(byte[] input, int offset, int length)
Statistical data is cumulated in an internal structure when thedetect()
methods are called.void
detect(char[] input)
Statistical data is cumulated in an internal structure when thedetect()
methods are called.int
detect(char[] input, int offset, int length)
Statistical data is cumulated in an internal structure when thedetect()
methods are called.void
detect(InputStream input)
Statistical data is cumulated in an internal structure when thedetect()
methods are called.int
detect(InputStream input, int length)
Statistical data is cumulated in an internal structure when thedetect()
methods are called.void
detect(String input)
Statistical data is cumulated in an internal structure when thedetect()
methods are called.int
detect(String input, int length)
Statistical data is cumulated in an internal structure when thedetect()
methods are called.LCSDResultSet
getResult()
Determines the top ranking language/character set pairs from the cumulated statistical data.static boolean
isCharsetSupported(int charsettype, String charset)
Check whether the given character set that is equivalent to the Oracle, IANA, or Java Character Set is supported by the detection feature.void
reset()
To reset statistical data for all pairs to0
.void
setCharacterSetFilter(String charset)
Sets the character set filter if you know the character set of the input data.void
setLanguageFilter(String language)
Sets the language filter if you know the language of the input data.
-
-
-
Constructor Detail
-
LCSDetector
public LCSDetector()
Constructor which uses the standard default profile.
-
LCSDetector
public LCSDetector(String name)
Constructor which takes a profile name and allows you to choose a profile other than the default.- Parameters:
name
- name of profile to use- Throws:
IllegalArgumentException
- if an invalid profile name is specified
-
-
Method Detail
-
setCharacterSetFilter
public void setCharacterSetFilter(String charset)
Sets the character set filter if you know the character set of the input data. The default value is none. If both the language filter and character set filter are set, they are ignored. If an invalid IANA character set name is passed in, it is ignored.- Parameters:
charset
- IANA character set name- Throws:
IllegalArgumentException
- if an invalid character set is specified
-
setLanguageFilter
public void setLanguageFilter(String language)
Sets the language filter if you know the language of the input data. The default value is none. If both the language filter and character set filter are set, they are ignored. If an invalid language name is passed in, it is ignored.- Parameters:
language
- ISO language name.- Throws:
IllegalArgumentException
- if an invalida language is specified
-
detect
public void detect(byte[] input)
Statistical data is cumulated in an internal structure when thedetect()
methods are called. Use thereset()
method to clear the cumulated statistics.- Parameters:
input
- the bytes to be sampled by thedetect
method
-
detect
public int detect(byte[] input, int offset, int length)
Statistical data is cumulated in an internal structure when thedetect()
methods are called. Use thereset()
method to clear the cumulated statistics. Only the specified length of bytes is sampled.- Parameters:
input
- the bytes to be sampled by thedetect
methodoffset
- the index of the first byte to samplelength
- the number of bytes to sample- Returns:
- the number of bytes sampled,
or
-1
if the end of the array reached - Throws:
IllegalArgumentException
- call thereset
method
-
detect
public void detect(InputStream input) throws IOException
Statistical data is cumulated in an internal structure when thedetect()
methods are called. Use thereset()
method to clear the cumulated statistics. The entire stream is sampled by thedetect()
method.- Parameters:
input
-InputStream
to be sampled by thedetect
method- Throws:
IOException
- if error occurs while doing operation on streamIllegalArgumentException
- call thereset
method
-
detect
public int detect(InputStream input, int length) throws IOException
Statistical data is cumulated in an internal structure when thedetect()
methods are called. Use thereset()
method to clear the cumulated statistics. Only the specified length of bytes will be sampled.- Parameters:
input
-InputStream
to be sampled by thedetect()
methodlength
- the number of bytes to sample- Returns:
- the number of bytes sampled,
or
-1
if the end of the stream is reached - Throws:
IOException
- if error occurs while doing operation on streamIllegalArgumentException
- callreset
method
-
detect
public void detect(String input)
Statistical data is cumulated in an internal structure when thedetect()
methods are called. Use thereset()
method to clear the cumulated statistics. The entire string is sampled by thedetect()
method.- Parameters:
input
- to be sampled by thedetect
method
-
detect
public int detect(String input, int length)
Statistical data is cumulated in an internal structure when thedetect()
methods are called. Use thereset()
method to clear the cumulated statistics. Only the specified length of characters will be sampled.- Parameters:
input
- a string to be sampled by thedetect()
methodlength
- the number of characters to sample- Returns:
- the number of characters sampled,
or
-1
if the end of the string is reached - Throws:
IllegalArgumentException
- callreset
method
-
detect
public void detect(char[] input)
Statistical data is cumulated in an internal structure when thedetect()
methods are called. Use thereset()
method to clear the cumulated statistics. The entire array is sampled by thedetect()
method.- Parameters:
input
- the characters to be sampled by thedetect
method
-
detect
public int detect(char[] input, int offset, int length)
Statistical data is cumulated in an internal structure when thedetect()
methods are called. Use thereset()
method to clear the cumulated statistics. Only the specified length of characters will be sampled.- Parameters:
input
- thechar
array to be sampled by thedetect()
methodoffset
- the index of the first character to samplelength
- the number of characters to sample- Returns:
- the number of characters sampled, or
-1
if the end of the array reached. - Throws:
IllegalArgumentException
- callreset
method
-
getResult
public LCSDResultSet getResult()
Determines the top ranking language/character set pairs from the cumulated statistical data.- Returns:
- An
LCSDResultSet
object which contains the result
-
isCharsetSupported
public static boolean isCharsetSupported(int charsettype, String charset)
Check whether the given character set that is equivalent to the Oracle, IANA, or Java Character Set is supported by the detection feature.See
LocaleMapper
for the parameterORACLE
,IANA
, orJAVA
.- Parameters:
charsettype
- can beORACLE
,IANA
, orJAVA
.charset
- the given character set- Returns:
true
if the given character set is supported by the detection feature, orfalse
if not- Throws:
IllegalArgumentException
- if an invalid profile is specified
-
reset
public void reset()
To reset statistical data for all pairs to0
.
-
-