With supervised classification, you employ the
CTX_CLS.TRAIN procedure to automate the rule writing step.
CTX_CLS.TRAIN uses a training set of sample documents to deduce classification rules. This is the major advantage over rule-based classification, in which you must write the classification rules.
However, before you can run the
CTX_CLS.TRAIN procedure, you must manually create categories and assign each document in the sample training set to a category.
Oracle Text Reference for more information on
When the rules are generated, you index them to create a
CTXRULE index. You can then use the
MATCHES operator to classify an incoming stream of new documents.
You may choose between two different classification algorithms for supervised classification:
The advantage of Decision Tree classification is that the generated rules are easily observed (and modified). See "Decision Tree Supervised Classification Example".
This method uses the Support Vector Machine (SVM) algorithm for creating rules. The advantage of SVM-based classification is that it is often more accurate than Decision Tree classification. The disadvantage is that it generates binary rules, so the rules themselves are opaque. See "SVM-Based Supervised Classification Example".