Oracle Text Application Developer's Guide Release 9.0.1 Part Number A90122-01 |
|
Working With a Thesaurus, 4 of 5
Defining a custom thesaurus allows you to process queries more intelligently. Since users of your application might not know which words represent a topic, you can define synonyms or narrower terms for likely query terms. You can use the thesaurus operators to expand your query into your thesaurus terms.
There are two ways to enhance your query application with a custom thesaurus so that you can process queries more intelligently:
Each approach has its advantages and disadvantages.
To build a custom thesaurus, follow these steps:
tech_doc
from an import file named tech_thesaurus.txt
:
ctxload -user jsmith/123abc -thes -name tech_doc -file tech_thesaurus.txt
'SYN(XML, tech_doc)'
The advantage of using this method is that you can modify the thesaurus after indexing.
This method requires you to use thesaurus expansion operators in your query. Long queries can cause extra overhead in the thesaurus expansion and slow your query down.
You can add your custom thesaurus to a branch in the existing knowledge base. The knowledge base is a hierarchical tree of concepts used for theme indexing, ABOUT queries, and deriving themes for document services.
When you augment the existing knowledge base with your new thesaurus, you query with the ABOUT operator which implicitly expands to synonyms and narrower terms. You do not query with the thesaurus operators.
To augment the existing knowledge base with your custom thesaurus, follow these steps:
ctxload
. See "Loading a Thesaurus with ctxload".
ctxkbtc
compiler. "Compiling a Loaded Thesaurus" later in this section.
'about(politics)'
Compiling your custom thesaurus with the existing knowledge base before indexing allows for faster and simpler queries with the ABOUT operator. Document services can also take full advantage of the customized information for creating theme summaries and Gists.
Use of the ABOUT operator requires a theme component in the index, which requires slightly more disk space. You must also define the thesaurus before indexing your documents. If you make any change to the thesuarus, you must recompile your thesaurus and re-index your documents.
When adding terms to the knowledge base, Oracle recommends that new terms be linked to one of the categories in the knowledge base for best results in theme proving.
If new terms are kept completely separate from existing categories, fewer themes from new terms will be proven. The result of this is poor precision and recall with ABOUT queries as well as poor quality of gists and theme highlighting.
You link new terms to existing terms by making an existing term the broader term for the new terms.
You purchase a medical thesaurus medthes
containing a a hierarchy of medical terms. The four top terms in the thesaurus are the following:
To link these terms to the existing knowledge base, add the following entries to the medical thesaurus to map the new terms to the existing health and medicine branch:
health and medicine NT Anesthesia and Analgesia NT Anti-Allergic and Respiratory System Agents NT Anti-Inflamammatory Agents, Antirheumatic Agents, and Inflamation Mediators NT Antineoplastic and Immunosuppressive Agents
Assuming the medical thesaurus is in a file called med.thes
, you load the thesaurus as medthes
with ctxload
as follows:
ctxload -thes -thescase y -name medthes -file med.thes -user ctxsys/ctxsys
To link the loaded thesaurus medthes
to the knowledge base, use ctxkbtc
as follows:
ctxkbtc -user ctxsys/ctxsys -name medthes
|
Copyright © 1996-2001, Oracle Corporation. All Rights Reserved. |
|