Oracle Text Application Developer's Guide Release 9.0.1 Part Number A90122-01 |
|
Working With a Thesaurus, 2 of 5
Users of your query application looking for information on a given topic might not know which words have been used in documents that refer to that topic.
Oracle Text enables you to create case-sensitive or case-insensitive thesauri which define synonym and hierarchical relationships between words and phrases. You can then retrieve documents that contain relevant text by expanding queries to include similar or related terms as defined in the thesaurus.
You can create a thesaurus and load it into the system.
Thesauri and thesaurus entries can be created, modified, and deleted by all Oracle Text users with the CTXAPP role.
To maintain and browse your thesaurus programatically, you can use the PL/SQL package, CTX_THES. With this package, you can browse terms and hierarchical relationships, add and delete terms, and add and remove thesaurus relations.
You can also use the thesaurus operators in the CONTAINS clause to expand query terms according to your loaded thesaurus. For example, you can use the SYN operator to expand a term such as dog to its synonyms as follows:
'syn(dog)'
The ctxload utility can be used for loading (creating) thesauri from a plain-text file into the thesaurus tables, as well as dumping thesauri from the tables into output (dump) files.
The thesaurus dump files created by ctxload can be printed out or used as input for other applications. The dump files can also be used to load a thesaurus into the thesaurus tables. This can be useful for using an existing thesaurus as the basis for creating a new thesaurus.
In a case-sensitive thesaurus, terms (words and phrases) are stored exactly as entered. For example, if a term is entered in mixed-case (using either the CTX_THES package or a thesaurus load file), the thesaurus stores the entry in mixed-case.
When loading a thesaurus, you can specify that the thesaurus be loaded case-sensitive using the -thescase parameter.
When creating a thesaurus with CTX_THES.CREATE_THESAURUS, you can specify that the thesaurus created be case-sensitive.
In addition, when a case-sensitive thesaurus is specified in a query, the thesaurus lookup uses the query terms exactly as entered in the query. Therefore, queries that use case-sensitive thesauri allow for a higher level of precision in the query expansion, which helps lookup when and only when you have a case-sensitive index.
For example, a case-sensitive thesaurus is created with different entries for the distinct meanings of the terms Turkey (the country) and turkey (the type of bird). Using the thesaurus, a query for Turkey expands to include only the entries associated with Turkey.
In a case-insensitive thesaurus, terms are stored in all-uppercase, regardless of the case in which they were entered.
The ctxload program loads a thesaurus case-insensitive by default.
When creating a thesaurus with CTX_THES.CREATE_THESAURUS, the thesaurus is created case-insensitive by default.
In addition, when a case-insensitive thesaurus is specified in a query, the query terms are converted to all-uppercase for thesaurus lookup. As a result, Oracle Text is unable to distinguish between terms that have different meanings when they are in mixed-case.
For example, a case-insensitive thesaurus is created with different entries for the two distinct meanings of the term TURKEY (the country or the type of bird). Using the thesaurus, a query for either Turkey or turkey is converted to TURKEY for thesaurus lookup and then expanded to include all the entries associated with both meanings.
If you do not specify a thesaurus by name in a query, by default, the thesaurus operators use a thesaurus named DEFAULT. However, Oracle Text does not provide a DEFAULT thesaurus.
As a result, if you want to use a default thesaurus for the thesaurus operators, you must create a thesaurus named DEFAULT. You can create the thesaurus through any of the thesaurus creation methods supported by Oracle Text:
Although Oracle Text does not provide a default thesaurus, Oracle Text does supply a thesaurus, in the form of a ctxload load file, that can be used to create a general-purpose, English-language thesaurus.
The thesaurus load file can be used to create a default thesaurus for Oracle Text or it can be used as the basis for creating thesauri tailored to a specific subject or range of subjects.
The supplied thesaurus is similar to a traditional thesaurus, such as Roget's Thesaurus, in that it provides a list of synonymous and semantically related terms.
The supplied thesaurus provides additional value by organizing the terms into a hierarchy that defines real-world, practical relationships between narrower terms and their broader terms.
Additionally, cross-references are established between terms in different areas of the hierarchy.
The exact name and location of the thesaurus load file is operating system dependent; however, the file is generally named dr0thsus
(with an appropriate extension for text files) and is generally located in the following directory structure:
<Oracle_home_directory> <interMedia_Text_directory> sample thes
|
Copyright © 1996-2001, Oracle Corporation. All Rights Reserved. |
|