Oracle® Globalization Development Kit Java API Reference
10g Release 1(10.1)

B10971-01

oracle.i18n.lcsd
Class LCSDetector

java.lang.Object
  |
  +--oracle.i18n.lcsd.LCSDetector

public class LCSDetector
extends Object

The LCSDetector class contains methods to automatically detect and recognize language and/or encoding based on text input.

To use LCSDetector, call getInstance() to obtain an instance of the LCSDetector class. You can specify a profile by calling the method getInstance(profile), or simply call getInstance() to use the standard profile. Depending on the content of the text you plan to sample. Certain profiles may yield more accurate results. For example, if you are sampling medical journals, you many want to use a profile that is built using mainly medical journals. If you are sampling computer related white papers, a profile built with similar documents will help improve the accuracy of the detection. Currently, we only provide one standard profile which is for general purpose detection.

The detection process can begin by calling the detect(byte[]) method. Statistics are cummulated every time a detect(byte[]) method is called. When the user is ready for the result, getResult() can be called to retrieve a LCSDResultSet instance. To begin a new detection using the same LCSDetector instance, reset() can be called to remove the cummulated statistics.

Since:
10.1.0.2
See Also:
LCSDResultSet

Constructor Summary
LCSDetector()
          Constructor.
LCSDetector(String name)
          Constructor which takes a profile name allows user to choose a different profile other than the default

 

Method Summary
 void detect(byte[] input)
          Statistical data is cumulated in an internal structure when the detect() methods are called.
 int detect(byte[] input, int offset, int length)
          Statistical data is cumulated in an internal structure when the detect() methods are called.
 void detect(char[] input)
          Statistical data is cumulated in an internal structure when the detect() methods are called.
 int detect(char[] input, int offset, int length)
          Statistical data is cumulated in an internal structure when the detect() methods are called.
 void detect(InputStream input)
          Statistical data is cumulated in an internal structure when the detect() methods are called.
 int detect(InputStream input, int length)
          Statistical data is cumulated in an internal structure when the detect() methods are called.
 void detect(String input)
          Statistical data is cumulated in an internal structure when the detect() methods are called.
 int detect(String input, int length)
          Statistical data is cumulated in an internal structure when the detect() methods are called.
 oracle.i18n.lcsd.LCSDResultSet getResult()
          Determines the high hit language/character set pairs from the cumulated statistical data
 void reset()
          To reset statistical data for all pairs to 0.
 int setCharacterSetFilter(String charset)
          Set character set filter if user know the character set of the input data.
 int setLanguageFilter(String language)
          Set language filter if user knows the language of the input data.

 

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

 

Constructor Detail

LCSDetector

public LCSDetector()
Constructor. Uses the standard default profile

LCSDetector

public LCSDetector(String name)
Constructor which takes a profile name allows user to choose a different profile other than the default
Parameters:
name - name of profile to use
Method Detail

setCharacterSetFilter

public int setCharacterSetFilter(String charset)
Set character set filter if user know the character set of the input data. The default value is none. if both language filter and character set filter are set, they will be ignored. If invalid ISO character set name is passed in, it will be ignored.
Parameters:
charset - ISO character set name.

setLanguageFilter

public int setLanguageFilter(String language)
Set language filter if user knows the language of the input data. The default value is none. If both language filter and character set filter are set, they will be ignored. If invalid language name is passed in, it will be ignored.

detect

public void detect(byte[] input)
Statistical data is cumulated in an internal structure when the detect() methods are called. Use reset() to clear the cumulated statictics.
Parameters:
input - the bytes to be sampled by detect

detect

public int detect(byte[] input,
                  int offset,
                  int length)
Statistical data is cumulated in an internal structure when the detect() methods are called. Use reset() to clear the cumulated statictics. Only the specified length of bytes will be sampled
Parameters:
input - the bytes to be sampled by detect
offset - the index of the first byte to sample
length - the number of bytes to sample
Returns:
the number of bytes smapled, or -1 if the end of the array reached(?)

detect

public void detect(InputStream input)
            throws IOException
Statistical data is cumulated in an internal structure when the detect() methods are called. Use reset() to clear the cumulated statistics. The entire stream will be sampled by detect.
Parameters:
input - inputStream to be sampled by detect
Throws:
IOException - if error occurs while doing operation on stream

detect

public int detect(InputStream input,
                  int length)
           throws IOException
Statistical data is cumulated in an internal structure when the detect() methods are called. Use reset() to clear the cumulated statistics. Only the specified length of bytes will be sampled
Parameters:
input - inputStream to be sampled by detect
length - the number of bytes to sample
Returns:
the number of bytes sampled, or -1 if the end of the stream is reached
Throws:
IOException - if error occurs while doing operation on stream

detect

public void detect(String input)
Statistical data is cumulated in an internal structure when the detect() methods are called. Use reset() to clear the cumulated statistics. The entire String will be sampled by detect.

detect

public int detect(String input,
                  int length)
Statistical data is cumulated in an internal structure when the detect() methods are called. Use reset() to clear the cumulated statistics. Only the specified length of chars will be sampled
Parameters:
input - a string to be sampled by detect
length - the number of chars to sample
Returns:
the number of chars sampled, or -1 if the end of the String is reached

detect

public void detect(char[] input)
Statistical data is cumulated in an internal structure when the detect() methods are called. Use reset() to clear the cumulated statistics. The entire array will be sampled by detect
Parameters:
input - the chars to be sampled by detect

detect

public int detect(char[] input,
                  int offset,
                  int length)
Statistical data is cumulated in an internal structure when the detect() methods are called. Use reset() to clear the cumulated statistics. Only the specified length of chars will be sampled
Parameters:
input - the char array to be sampled by detect
offset - the index of the first char to sample
length - the number of chars to sample
Returns:
the number ofchars sampled, or -1 if the end of the array reached

getResult

public oracle.i18n.lcsd.LCSDResultSet getResult()
Determines the high hit language/character set pairs from the cumulated statistical data
Returns:
a LCSDResultSet object which contains the result

reset

public void reset()
To reset statistical data for all pairs to 0.

Oracle® Globalization Development Kit Java API Reference
10g Release 1(10.1)

B10971-01

Copyright © 2003 Oracle Corporation. All Rights Reserved.