9.4 Rule-Based Classification
Rule-based classification is the basic solution for creating an Oracle Text classification application.
The basic steps for rule-based classification are as follows. Specific steps are explored in greater detail in the example.
-
Create a table for the documents to be classified, and then populate it.
-
Create a rule table (also known as a category table). The rule table consists of categories that you name, such as "medicine" or "finance," and the rules that sort documents into those categories.
These rules are actually queries. For example, you define the "medicine" category as documents that include the words "hospital," "doctor," or "disease." Therefore, you would set up a rule in the form of "hospital OR doctor OR disease."
-
Create a
CTXRULE
index on the rule table. -
Classify the documents.
See Also:
"CTXRULE Parameters and Limitations" for information on which operators are allowed for queries
9.4.1 Rule-Based Classification Example
In this example, you gather news articles about different subjects and then classify them. After you create the rules, you can index them and then use the MATCHES
statement to classify documents.
To classify documents:
9.4.2 CTXRULE Parameters and Limitations
The following considerations apply to indexing a CTXRULE
index:
-
If you use the
SVM_CLASSIFIER
classifier, then you may use theBASIC_LEXER,
CHINESE_LEXER,
JAPANESE_LEXER,
orKOREAN_MORPH_LEXER
lexers. If you do not useSVM_CLASSIFIER,
then you can use only theBASIC_LEXER
lexer type to index your query set. -
Filter, memory, datastore, and [no]populate parameters are not applicable to the
CTXRULE
index type. -
The
CREATE
INDEX
storage clause is supported for creating the index on the queries. -
Wordlists are supported for stemming operations on your query set.
-
Queries for
CTXRULE
are similar to theCONTAINS
queries. Basic phrasing ("dog house") is supported, as are the followingCONTAINS
operators:ABOUT,
AND,
NEAR,
NOT,
OR,
STEM,
WITHIN,
andTHESAURUS.
Section groups are supported for using theMATCHES
operator to classify documents. Field sections are also supported; however,CTXRULE
does not directly support field queries, so you must use a query rewrite on aCONTEXT
query. -
You must drop the
CTXRULE
index before exporting or downgrading the database.
See Also:
-
Oracle Text Reference for more information on lexer and classifier preferences