You can add your custom thesaurus to a branch in the existing knowledge base. The knowledge base is a hierarchical tree of concepts used for theme indexing,
ABOUT queries, and deriving themes for document services.
When you augment the existing knowledge base with your new thesaurus, you query with the
ABOUT operator which implicitly expands to synonyms and narrower terms. You do not query with the thesaurus operators.
To augment the existing knowledge base with your custom thesaurus, follow these steps:
ctxload utility. See "Loading a Thesaurus with ctxload".
Using the PL/SQL procedure
CTX_THES.IMPORT_THESAURUS. See "Loading a Thesaurus with PL/SQL procedure CTX_THES.IMPORT_THESAURUS".
ctxkbtccompiler. Refer to "Compiling a Loaded Thesaurus".
ABOUToperator to query. For example, to find all documents that are related to the term politics including any synonyms or narrower terms as defined in the knowledge base, enter the query:
Compiling your custom thesaurus with the existing knowledge base before indexing enables faster and simpler queries with the
ABOUT operator. Document services can also take full advantage of the customized information for creating theme summaries and Gists.
Use of the
ABOUT operator requires a theme component in the index, which requires slightly more disk space. You must also define the thesaurus before indexing your documents. If you make any change to the thesaurus, you must recompile your thesaurus and re-index your documents.
When adding terms to the knowledge base, Oracle recommends that new terms be linked to one of the categories in the knowledge base for best results in theme proving.
Oracle Text Reference for more information about the supplied English knowledge base
If new terms are kept completely separate from existing categories, fewer themes from new terms will be proven. The result of this is poor precision and recall with
ABOUT queries as well as poor quality of gists and theme highlighting.
You link new terms to existing terms by making an existing term the broader term for the new terms.
You purchase a medical thesaurus
medthes containing a a hierarchy of medical terms. The four top terms in the thesaurus are as follows:
Anesthesia and Analgesia
Anti-Allergic and Respiratory System Agents
Anti-Inflammatory Agents, Antirheumatic Agents, and Inflammation Mediators
Antineoplastic and Immunosuppressive Agents
To link these terms to the existing knowledge base, add the following entries to the medical thesaurus to map the new terms to the existing health and medicine branch:
health and medicine NT Anesthesia and Analgesia NT Anti-Allergic and Respiratory System Agents NT Anti-Inflamammatory Agents, Antirheumatic Agents, and Inflamation Mediators NT Antineoplastic and Immunosuppressive Agents
Assuming the medical thesaurus is in a file called
med.thes, you load the thesaurus as
ctxload as follows:
ctxload -thes -thescase y -name medthes -file med.thes -user ctxsys
When you enter the
ctxload command line, you are prompted for the user password. For best security practices, never enter the password at the command line. Alternatively, you may omit the
-user and let
ctxload prompt you for username and password, respectively.
The following example creates a case-sensitive thesaurus named
mythesaurus and imports the thesaurus content present in
myclob into the Oracle Text thesaurus tables:
declare myclob clob; begin myclob := to_clob('peking SYN beijing BT capital country NT beijing tokyo'); ctx_thes.import_thesaurus(‘mythesaurus', myclob, ‘Y'); end;
The format of the thesaurus to be imported (
myclob in this example) should be the same as used by the
ctxload utility. If the format of the thesaurus to be imported is not correct, then
IMPORT_THESAURUS raises an exception.
To link the loaded thesaurus
medthes to the knowledge base, use
ctxkbtc as follows:
ctxkbtc -user ctxsys -name medthes
When you enter the
ctxkbtc command line, you are prompted for the user password. As with
ctxload, for best security practices, do not enter the password at the command line.