Jive Forums API (5.5.20.2-oracle) Developer Javadocs

com.jivesoftware.util.search
Class CJKAnalyzer.CJKTokenizer

java.lang.Object
  extended by org.apache.lucene.analysis.TokenStream
      extended by org.apache.lucene.analysis.Tokenizer
          extended by com.jivesoftware.util.search.CJKAnalyzer.CJKTokenizer
Enclosing class:
CJKAnalyzer

public final class CJKAnalyzer.CJKTokenizer
extends org.apache.lucene.analysis.Tokenizer

CJKTokenizer was modified from StopTokenizer which does a decent job for most European languages and it performs other token method for double-byte Characters: the token will return at each two charactors with overlap match.
Example: "java C1C2C3C4" will be segment to: "java" "C1C2" "C2C3" "C3C4" it also need filter filter zero length token ""
for Digit: digit, '+', '#' will token as letter
for more info on Asia language(Chinese Japanese Korean) text segmentation: please search google


Field Summary
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Constructor Summary
CJKAnalyzer.CJKTokenizer(java.io.Reader in)
          Construct a token stream processing the given input.
 
Method Summary
 org.apache.lucene.analysis.Token next()
          Returns the next token in the stream, or null at EOS.
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CJKAnalyzer.CJKTokenizer

public CJKAnalyzer.CJKTokenizer(java.io.Reader in)
Construct a token stream processing the given input.

Parameters:
in - I/O reader
Method Detail

next

public final org.apache.lucene.analysis.Token next()
                                            throws java.io.IOException
Returns the next token in the stream, or null at EOS.

Specified by:
next in class org.apache.lucene.analysis.TokenStream
Returns:
Token
Throws:
java.io.IOException - - throw IOException when read error
hanppened in the InputStream
See Also:
Character.UnicodeBlock Javadocs

Jive Forums Project Page

Copyright © 1999-2006 Jive Software.