Skip Headers

Oracle Text Reference
Release 9.2

Part Number A96518-01
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback

Go to previous page Go to next page

14
Executables

This chapter discusses the executables shipped with Oracle Text. The following topics are discussed:

Thesaurus Loader (ctxload)

Use ctxload to do the following with a thesaurus:

An import file is an ASCII flat file that contains entries for synonyms, broader terms, narrower terms, or related terms which can be used to expand queries.

See Also:

For examples of import files for thesaurus importing, see "Structure of ctxload Thesaurus Import File" in Appendix C, "Loading Examples".

Text Loading

The ctxload program no longer supports the loading of text columns. To load files to a text column in batch, Oracle recommends that you use SQL*Loader.

See Also:

"SQL*Loader Example" in Appendix C, "Loading Examples"

ctxload Syntax

ctxload -user username[/password][@sqlnet_address]
        -name object_name
        -file file_name
      
       [-thes]
       [-thescase y|n]
       [-thesdump]
       [-log file_name]
       [-trace]
       [-pk]
       [-export]
       [-update]

Mandatory Arguments

-user

Specify the username and password of the user running ctxload.

The username and password can be followed immediately by @sqlnet_address to permit logon to remote databases. The value for sqlnet_address is a database connect string. If the TWO_TASK environment variable is set to a remote database, you do not have to specify a value for sqlnet_address to connect to the database.

-name object_name

When you use ctxload to export/import a thesaurus, use object_name to specify the name of the thesaurus to be exported/imported.

You use object_name to identify the thesaurus in queries that use thesaurus operators.


Note:

Thesaurus name must be unique. If the name specified for the thesaurus is identical to an existing thesaurus, ctxload returns an error and does not overwrite the existing thesaurus.


When you use ctxload to update/export a text field, use object_name to specify the index associated with the text column.

-file file_name

When ctxload is used to import a thesaurus, use file_name to specify the name of the import file which contains the thesaurus entries.

When ctxload is used to export a thesaurus, use file_name to specify the name of the export file created by ctxload.


Note:

If the name specified for the thesaurus dump file is identical to an existing file, ctxload overwrites the existing file.


Optional Arguments

-thes

Import a thesaurus. Specify the source file with the -file argument. You specify the name of the thesaurus to be imported with -name.

-thescase y | n

Specify y to create a case-sensitive thesaurus with the name specified by -name and populate the thesaurus with entries from the thesaurus import file specified by -file. If -thescase is y (the thesaurus is case-sensitive), ctxload enters the terms in the thesaurus exactly as they appear in the import file.

The default for -thescase is n (case-insensitive thesaurus)


Note:

-thescase is valid for use with only the -thes argument.


-thesdump

Export a thesaurus. Specify the name of the thesaurus to be exported with the -name argument. Specify the destination file with the -file argument.

-log

Specify the name of the log file to which ctxload writes any national-language supported (Globalization Support) messages generated during processing. If you do not specify a log file name, the messages appear on the standard output.

-trace

Enables SQL statement tracing using ALTER SESSION SET SQL_TRACE TRUE. This command captures all processed SQL statements in a trace file, which can be used for debugging. The location of the trace file is operating-system dependent and can be modified using the USER_DUMP_DEST initialization parameter.

See Also:

For more information about SQL trace and the USER_DUMP_DEST initialization parameter, see Oracle9i Database Administrator's Guide

-pk

Specify the primary key value of the row to be updated or exported.

When the primary key is compound, you must enclose the values within double quotes and separate the keys with a comma.

-export

Exports the contents of a CLOB or BLOB column in a database table into the operating system file specified by -file. ctxload exports the CLOB or BLOB column in the row specified by -pk.

When you use the -export, you must specify a primary key with -pk.

-update

Updates the contents of a CLOB or BLOB column in a database table with the contents of the operating system file specified by -file. ctxload updates the CLOB or BLOB column in for the row specified by -pk.

When you use -update, you must specify a primary key with -pk.

ctxload Examples

This section provides examples for some of the operations that ctxload can perform.

See Also:

For more document loading examples, see Appendix C, "Loading Examples".

Thesaurus Import Example

The following example imports a thesaurus named tech_doc from an import file named tech_thesaurus.txt:

ctxload -user jsmith/123abc -thes -name tech_doc -file tech_thesaurus.txt 

Thesaurus Export Example

The following example dumps the contents of a thesaurus named tech_doc into a file named tech_thesaurus.out:

ctxload -user jsmith/123abc -thesdump -name tech_doc -file tech_thesaurus.out 

Knowledge Base Extension Compiler (ctxkbtc)

The knowledge base is the information source Oracle Text uses to perform theme analysis, such as theme indexing, processing ABOUT queries, and document theme extraction with the CTX_DOC package. A knowledge base is supplied for English and French.

With the ctxkbtc compiler, you can do the following:

Knowledge Base Character Set

Knowledge bases can be in any single-byte character set. Supplied knowledge bases are in WE8ISO8859P1. You can store an extended knowledge base in another character set such as US7ASCII.

ctxkbtc Syntax

ctxkbtc -user uname/passwd


[-name thesname1 [thesname2 ... thesname16]]
[-revert]
[-stoplist stoplistname]
[-verbose]
[-log filename]
-user

Specify the username and password for the administrator creating an extended knowledge base. This user must have write permission to the ORACLE_HOME directory.

-name thesname1 [thesname2 ... thesname16]

Specify the name(s) of the thesauri (up to 16) to be compiled with the knowledge base to create the extended knowledge base. The thesauri you specify must already be loaded with ctxload with the -thescase Y option

-revert

Reverts the extended knowledge base to the default knowledge base provided by Oracle Text.

-stoplist stoplistname

Specify the name of the stoplist. Stopwords in the stoplist are added to the knowledge base as useless words that are prevented from becoming themes or contributing to themes. You can still add stopthemes after running this command using CTX_DLL.ADD_STOPTHEME.

-verbose

Displays all warnings and messages, including non-Globalization Support messages, to the standard output.

-log

Specify the log file for storing all messages. When you specify a log file, no messages are reported to standard out.

ctxkbtc Usage Notes

ctxkbtc Limitations

The ctxkbtc program has the following limitations:

ctxkbtc Constraints on Thesaurus Terms

Terms are case sensitive. If a thesaurus has a term in uppercase, for example, the same term present in lowercase form in a document will not be recognized.

The maximum length of a term is 80 characters.

Disambiguated homographs are not supported.

ctxkbtc Constraints on Thesaurus Relations

The following constraints apply to thesaurus relations:

Extending the Knowledge Base

You can extend the supplied knowledge base by compiling one or more thesauri with the Oracle Text knowledge base. The extended information can be application-specific terms and relationships. During theme analysis, the extended portion of the knowledge base overrides any terms and relationships in the knowledge base where there is overlap.

When extending the knowledge base, Oracle recommends that new terms be linked to one of the categories in the knowledge base for best results in theme proving when appropriate.

See Also:

For more information about the knowledge base, see Appendix I, "English Knowledge Base Category Hierarchy"

If new terms are kept completely disjoint from existing categories, fewer themes from new terms will be proven. The result of this is poorer precision and recall with ABOUT queries as well poor quality of gists and theme highlighting.

You link new terms to existing terms by making an existing term the broader term for the new terms.

Example for Extending the Knowledge Base

You purchase a medical thesaurus medthes containing a hierarchy of medical terms. The four top terms in the thesaurus are the following:

To link these terms to the existing knowledge base, add the following entries to the medical thesaurus to map the new terms to the existing health and medicine branch:

health and medicine
 NT Anesthesia and Analgesia
 NT Anti-Allergic and Respiratory System Agents
 NT Anti-Inflamammatory Agents, Antirheumatic Agents, and Inflamation Mediators
 NT Antineoplastic and Immunosuppressive Agents

Set your Globalization Support language environment variable to match the database character set. For example, if your database character set is WE8ISO8859P1 and you are using American English, set your NLS_LANG as follows:

setenv NLS_LANG AMERICAN_AMERICA.WE8ISO8859P1

Assuming the medical thesaurus is in a file called med.thes, you load the thesaurus as medthes with ctxload as follows:

ctxload -thes -thescase y -name medthes -file med.thes -user ctxsys/ctxsys

To link the loaded thesaurus medthes to the knowledge base, use ctxkbtc as follows:

ctxkbtc -user ctxsys/ctxsys -name medthes 

Adding a Language-Specific Knowledge Base

You can extend theme functionality to languages other than English or French by loading your own knowledge base for any single-byte whitespace delimited language, including Spanish.

Theme functionality includes theme indexing, ABOUT queries, theme highlighting, and the generation of themes, gists, and theme summaries with the CTX_DOC PL/SQL package.

You extend theme functionality by adding a user-defined knowledge base. For example, you can create a Spanish knowledge base from a Spanish thesuarus.

To load your language-specific knowledge base, follow these steps:

  1. Load your custom thesaurus using ctxload.
  2. Set NLS_LANG so that the language portion is the target language. The charset portion must be a single-byte character set.
  3. Compile the loaded thesaurus using ctxkbtc:

ctxkbtc -user ctxsys/ctxsys -name my_lang_thes

This command compiles your language-specific knowledge base from the loaded thesaurus. To use this knowledge base for theme analysis during indexing and ABOUT queries, specify the NLS_LANG language as the THEME_LANGUAGE attribute value for the BASIC_LEXER preference.

Limitations for Adding a Knowledge Base

The following limitations hold for adding knowledge bases:

Order of Precedence for Multiple Thesauri

When multiple thesauri are to be compiled, precedence is determined by the order in which thesauri are listed in the arguments to the compiler (most preferred first). A user thesaurus always has precedence over the built-in knowledge base.

Size Limits for Extended Knowledge Base

The following table lists the size limits associated with creating and compiling an extended knowledge base:

Description of Parameter Limit

Number of RTs (from + to) per term

32

Number of terms per a single hierarchy (i.e., all narrower terms for a given top term)

64000

Number of new terms in an extended knowledge base

1 million

Number of separate thesauri that can be compiled into a user extension to the KB

16


Go to previous page Go to next page
Oracle
Copyright © 1998, 2002 Oracle Corporation.

All Rights Reserved.
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback