Class LCSDetector
- java.lang.Object
-
- oracle.i18n.lcsd.LCSDetector
-
public class LCSDetector extends Object
TheLCSDetectorclass contains methods to automatically detect and recognize language, encoding, or both based on text input.To use the
LCSDetectorclass, call thegetInstance()method to obtain an instance of theLCSDetectorclass. You can specify a profile by calling thegetInstance(profile)method, or simply call thegetInstance()method to use the standard profile depending on the content of the text you plan to sample. Certain profiles may yield more accurate results. For example, if you are sampling medical journals, you many want to use a profile that is built using mainly medical journals. If you are sampling computer related white papers, a profile built with similar documents improves the accuracy of the detection. Currently, we only provide one standard profile which is for general purpose detection.The detection process begins by calling the
detect(byte[])method. Statistics are cumulated every time adetect(byte[])method is called. When you are ready for the result, call thegetResult()method to retrieve anLCSDResultSetinstance. To begin a new detection using the sameLCSDetectorinstance, call thereset()method to remove the cumulated statistics.- Since:
- 10.1.0.2
- See Also:
LCSDResultSet
-
-
Constructor Summary
Constructors Constructor Description LCSDetector()Constructor which uses the standard default profile.LCSDetector(String name)Constructor which takes a profile name and allows you to choose a profile other than the default.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voiddetect(byte[] input)Statistical data is cumulated in an internal structure when thedetect()methods are called.intdetect(byte[] input, int offset, int length)Statistical data is cumulated in an internal structure when thedetect()methods are called.voiddetect(char[] input)Statistical data is cumulated in an internal structure when thedetect()methods are called.intdetect(char[] input, int offset, int length)Statistical data is cumulated in an internal structure when thedetect()methods are called.voiddetect(InputStream input)Statistical data is cumulated in an internal structure when thedetect()methods are called.intdetect(InputStream input, int length)Statistical data is cumulated in an internal structure when thedetect()methods are called.voiddetect(String input)Statistical data is cumulated in an internal structure when thedetect()methods are called.intdetect(String input, int length)Statistical data is cumulated in an internal structure when thedetect()methods are called.LCSDResultSetgetResult()Determines the top ranking language/character set pairs from the cumulated statistical data.static booleanisCharsetSupported(int charsettype, String charset)Check whether the given character set that is equivalent to the Oracle, IANA, or Java Character Set is supported by the detection feature.voidreset()To reset statistical data for all pairs to0.voidsetCharacterSetFilter(String charset)Sets the character set filter if you know the character set of the input data.voidsetLanguageFilter(String language)Sets the language filter if you know the language of the input data.
-
-
-
Constructor Detail
-
LCSDetector
public LCSDetector()
Constructor which uses the standard default profile.
-
LCSDetector
public LCSDetector(String name)
Constructor which takes a profile name and allows you to choose a profile other than the default.- Parameters:
name- name of profile to use- Throws:
IllegalArgumentException- if an invalid profile name is specified
-
-
Method Detail
-
setCharacterSetFilter
public void setCharacterSetFilter(String charset)
Sets the character set filter if you know the character set of the input data. The default value is none. If both the language filter and character set filter are set, they are ignored. If an invalid IANA character set name is passed in, it is ignored.- Parameters:
charset- IANA character set name- Throws:
IllegalArgumentException- if an invalid character set is specified
-
setLanguageFilter
public void setLanguageFilter(String language)
Sets the language filter if you know the language of the input data. The default value is none. If both the language filter and character set filter are set, they are ignored. If an invalid language name is passed in, it is ignored.- Parameters:
language- ISO language name.- Throws:
IllegalArgumentException- if an invalida language is specified
-
detect
public void detect(byte[] input)
Statistical data is cumulated in an internal structure when thedetect()methods are called. Use thereset()method to clear the cumulated statistics.- Parameters:
input- the bytes to be sampled by thedetectmethod
-
detect
public int detect(byte[] input, int offset, int length)Statistical data is cumulated in an internal structure when thedetect()methods are called. Use thereset()method to clear the cumulated statistics. Only the specified length of bytes is sampled.- Parameters:
input- the bytes to be sampled by thedetectmethodoffset- the index of the first byte to samplelength- the number of bytes to sample- Returns:
- the number of bytes sampled,
or
-1if the end of the array reached - Throws:
IllegalArgumentException- call theresetmethod
-
detect
public void detect(InputStream input) throws IOException
Statistical data is cumulated in an internal structure when thedetect()methods are called. Use thereset()method to clear the cumulated statistics. The entire stream is sampled by thedetect()method.- Parameters:
input-InputStreamto be sampled by thedetectmethod- Throws:
IOException- if error occurs while doing operation on streamIllegalArgumentException- call theresetmethod
-
detect
public int detect(InputStream input, int length) throws IOException
Statistical data is cumulated in an internal structure when thedetect()methods are called. Use thereset()method to clear the cumulated statistics. Only the specified length of bytes will be sampled.- Parameters:
input-InputStreamto be sampled by thedetect()methodlength- the number of bytes to sample- Returns:
- the number of bytes sampled,
or
-1if the end of the stream is reached - Throws:
IOException- if error occurs while doing operation on streamIllegalArgumentException- callresetmethod
-
detect
public void detect(String input)
Statistical data is cumulated in an internal structure when thedetect()methods are called. Use thereset()method to clear the cumulated statistics. The entire string is sampled by thedetect()method.- Parameters:
input- to be sampled by thedetectmethod
-
detect
public int detect(String input, int length)
Statistical data is cumulated in an internal structure when thedetect()methods are called. Use thereset()method to clear the cumulated statistics. Only the specified length of characters will be sampled.- Parameters:
input- a string to be sampled by thedetect()methodlength- the number of characters to sample- Returns:
- the number of characters sampled,
or
-1if the end of the string is reached - Throws:
IllegalArgumentException- callresetmethod
-
detect
public void detect(char[] input)
Statistical data is cumulated in an internal structure when thedetect()methods are called. Use thereset()method to clear the cumulated statistics. The entire array is sampled by thedetect()method.- Parameters:
input- the characters to be sampled by thedetectmethod
-
detect
public int detect(char[] input, int offset, int length)Statistical data is cumulated in an internal structure when thedetect()methods are called. Use thereset()method to clear the cumulated statistics. Only the specified length of characters will be sampled.- Parameters:
input- thechararray to be sampled by thedetect()methodoffset- the index of the first character to samplelength- the number of characters to sample- Returns:
- the number of characters sampled, or
-1if the end of the array reached. - Throws:
IllegalArgumentException- callresetmethod
-
getResult
public LCSDResultSet getResult()
Determines the top ranking language/character set pairs from the cumulated statistical data.- Returns:
- An
LCSDResultSetobject which contains the result
-
isCharsetSupported
public static boolean isCharsetSupported(int charsettype, String charset)Check whether the given character set that is equivalent to the Oracle, IANA, or Java Character Set is supported by the detection feature.See
LocaleMapperfor the parameterORACLE,IANA, orJAVA.- Parameters:
charsettype- can beORACLE,IANA, orJAVA.charset- the given character set- Returns:
trueif the given character set is supported by the detection feature, orfalseif not- Throws:
IllegalArgumentException- if an invalid profile is specified
-
reset
public void reset()
To reset statistical data for all pairs to0.
-
-