Go to main content

Supervised Classification

With supervised classification, you employ the CTX_CLS.TRAIN procedure to automate the rule writing step. CTX_CLS.TRAIN uses a training set of sample documents to deduce classification rules. This is the major advantage over rule-based classification, in which you must write the classification rules.

However, before you can run the CTX_CLS.TRAIN procedure, you must manually create categories and assign each document in the sample training set to a category.

See Also:

Oracle Text Reference for more information on CTX_CLS.TRAIN

When the rules are generated, you index them to create a CTXRULE index. You can then use the MATCHES operator to classify an incoming stream of new documents.

You may choose between two different classification algorithms for supervised classification:

Decision Tree Supervised Classification

The advantage of Decision Tree classification is that the generated rules are easily observed (and modified). See "Decision Tree Supervised Classification Example".
SVM-Based Supervised Classification

This method uses the Support Vector Machine (SVM) algorithm for creating rules. The advantage of SVM-based classification is that it is often more accurate than Decision Tree classification. The disadvantage is that it generates binary rules, so the rules themselves are opaque. See "SVM-Based Supervised Classification Example".