Oracle Text Reference Release 9.0.1 Part Number A90121-01 |
|
Executables, 3 of 3
The knowledge base is the information source Oracle Text uses to perform theme analysis, such as theme indexing, processing ABOUT queries, and document theme extraction with the CTX_DOC package. A knowledge base is supplied for English and French.
With the ctxkbtc
compiler, you can do the following:
For more information about the knowledge base packaged with Oracle Text, see Appendix I, "English Knowledge Base Category Hierarchy".
For more information about the ABOUT operator, see ABOUT operator in Chapter 3.
For more information about document services, see Chapter 7, "CTX_DOC Package".
See Also:
Knowledge bases can be in any single-byte character set. Supplied knowledge bases are in WE8ISO8859P1. You can store an extended knowledge base in another character set such as US7ASCII.
ctxkbtc -user uname/passwd[-name thesname1 [thesname2 ... thesname16]] [-revert] [-stoplist stoplistname] [-verbose] [-log filename]
Specify the username and password for the administrator creating an extended knowledge base. This user must have write permission to the ORACLE_HOME directory.
Specify the name(s) of the thesauri (up to 16) to be compiled with the knowledge base to create the extended knowledge base. The thesauri you specify must already be loaded with ctxload
with the -thescase Y
option
Reverts the extended knowledge base to the default knowledge base provided by Oracle Text.
Specify the name of the stoplist. Stopwords in the stoplist are added to the knowledge base as useless words that are prevented from becoming themes or contributing to themes. You can still add stopthemes after running this command using CTX_DLL.ADD_STOPTHEME.
Displays all warnings and messages, including non-NLS messages, to the standard output.
Specify the log file for storing all messages. When you specify a log file, no messages are reported to standard out.
ctxkbtc
, you must set the NLS_LANG environment variable to match the database character set.
ctxkbtc
must have write permission to the ORACLE_HOME, since the program writes files to this directory.
ctxkbtc
twice removes the previous extension.
ctxload
.
ctxkbtc
twice removes the previous extension.
The ctxkbtc
program has the following limitations:
Terms are case sensitive. If a thesaurus has a term in uppercase, for example, the same term present in lowercase form in a document will not be recognized.
The maximum length of a term is 80 characters.
Disambiguated homographs are not supported.
The following constraints apply to thesaurus relations:
You can extend the supplied knowledge base by compiling one or more thesauri with the Oracle Text knowledge base. The extended information can be application-specific terms and relationships. During theme analysis, the extended portion of the knowledge base overrides any terms and relationships in the knowledge base where there is overlap.
When extending the knowledge base, Oracle recommends that new terms be linked to one of the categories in the knowledge base for best results in theme proving when appropriate.
See Also:
For more information about the knowledge base, see Appendix I, "English Knowledge Base Category Hierarchy" |
If new terms are kept completely disjoint from existing categories, fewer themes from new terms will be proven. The result of this is poorer precision and recall with ABOUT queries as well poor quality of gists and theme highlighting.
You link new terms to existing terms by making an existing term the broader term for the new terms.
You purchase a medical thesaurus medthes
containing a hierarchy of medical terms. The four top terms in the thesaurus are the following:
To link these terms to the existing knowledge base, add the following entries to the medical thesaurus to map the new terms to the existing health and medicine branch:
health and medicine NT Anesthesia and Analgesia NT Anti-Allergic and Respiratory System Agents NT Anti-Inflamammatory Agents, Antirheumatic Agents, and Inflamation Mediators NT Antineoplastic and Immunosuppressive Agents
Set your NLS language environment variable to match the database character set. For example, if your database character set is WE8ISO8859P1 and you are using American English, set your NLS_LANG as follows:
setenv NLS_LANG AMERICAN_AMERICA.WE8ISO8859P1
Assuming the medical thesaurus is in a file called med.thes, you load the thesaurus as medthes
with ctxload
as follows:
ctxload -thes -thescase y -name medthes -file med.thes -user ctxsys/ctxsys
To link the loaded thesaurus medthes
to the knowledge base, use ctxkbtc
as follows:
ctxkbtc -user ctxsys/ctxsys -name medthes
You can extend theme functionality to languages other than English or French by loading your own knowledge base for any single-byte whitespace delimited language, including Spanish.
Theme functionality includes theme indexing, ABOUT queries, theme highlighting, and the generation of themes, gists, and theme summaries with the CTX_DOC PL/SQL package.
You extend theme functionality by adding a user-defined knowledge base. For example, you can create a Spanish knowledge base from a Spanish thesuarus.
To load your language-specific knowledge base, follow these steps:
ctxload
.
ctxkbtc
:
ctxkbtc -user ctxsys/ctxsys -name my_lang_thes
This command compiles your language-specific knowledge base from the loaded thesaurus. To use this knowledge base for theme analysis during indexing and ABOUT queries, specify the NLS_LANG language as the THEME_LANGUAGE attribute value for the BASIC_LEXER preference.
The following limitations hold for adding knowledge bases:
When multiple thesauri are to be compiled, precedence is determined by the order in which thesauri are listed in the arguments to the compiler (most preferred first). A user thesaurus always has precedence over the built-in knowledge base.
The following table lists the size limits associated with creating and compiling an extended knowledge base:
|
Copyright © 1996-2001, Oracle Corporation. All Rights Reserved. |
|