Oracle8i interMedia Text Reference
Release 2 (8.1.6)

Part Number A77063-01

Library

Product

Contents

Index

Go to previous page Go to beginning of chapter Go to next page

Indexing, 6 of 11


Wordlist Object

Use the wordlist preference to enable the advanced query options such as stemming and fuzzy matching for your language. To create a wordlist preference, you must use BASIC_WORDLIST, which is the only object available.

BASIC_WORDLIST

Use BASIC_WORDLIST object to enable stemming and fuzzy matching for Text indexes.

See Also:

For more information about the stem and fuzzy operators, see Chapter 4, "Query Operators"

BASIC_WORDLIST has the following attributes:

Table 3-1
Attribute  Attribute Values 

stemmer 

Specify which language stemmer to use. You can specify one of:

NULL (no stemming)

ENGLISH (English inflectional)

DERIVATIONAL (English derivational)

DUTCH

FRENCH

GERMAN

ITALIAN

SPANISH

AUTO (automatic language-detection for stemming) 

fuzzy_match 

Specify which fuzzy matching cluster to use. You can specify one of the following:

GENERIC

JAPANESE_VGRAM

KOREAN

CHINESE_VGRAM

ENGLISH

DUTCH

FRENCH

GERMAN

ITALIAN

SPANISH

OCR

AUTO (automatic language detection for stemming) 

fuzzy_score 

Specify a default lower limit of fuzzy score. Specify a number between 0 and 80. Setting fuzzy score means scores below this number are not produced. Default is 60. 

fuzzy_numresults 

Specify the maximum number of fuzzy expansions. Use a number between 0 and 5000. Default is 100. 

substring_index 

Specify TRUE for Oracle to create a substring index. A substring index improves left-truncated and double-truncated wildcard queries such as %ing or %benz%. Default is FALSE. 

stemmer

Specify the stemmer used for word stemming in Text queries. When you do not specify a value for stemmer, the default is ENGLISH.

Specify AUTO for the system to automatically set the stemming language according to the language setting of the session. When there is no stemmer for a language, the default is NULL. With the NULL stemmer, the $ operator is ignored in queries.

fuzzy_match

Specify which fuzzy matching routines are used for the column. Fuzzy matching is currently supported for English, Japanese, and, to a lesser extent, the Western European languages.

The default for fuzzy_match is GENERIC.

Specify AUTO for the system to automatically set the fuzzy matching language according to language setting of the session.


Note:

The fuzzy_match attribute values for Chinese and Korean are dummy attribute values that prevent the English and Japanese fuzzy matching routines from being used on Chinese and Korean text. 


fuzzy_score

Specify a default lower limit of fuzzy score. Specify a number between 0 and 80. Setting fuzzy score means scores below this number are not produced. Default is 60.

Fuzzy score is a measure of how close the expanded word is to the query word, the higher the score the better the match. Use this parameter to limit fuzzy expansions to the best matches.

fuzzy_numresults

Specify the maximum number of fuzzy expansions. Use a number between 0 and 5000. Default is 100.

Setting a fuzzy expansion limits the expansion to a certain number of the best matching words.

substring_index

Specify TRUE for Oracle to create a substring index. A substring index improves performance for left-truncated or double-truncated wildcard queries such as %ing or %benz%. The default is false.

Substring indexing has the following impact on indexing and disk resources:

BASIC_WORDLIST Example

The following example enables stem and fuzzy for English. The preference STEM_FUZZY_PREF sets the number of expansions to the maximum allowed. This preference also instructs the system to create a substring index to improve the performance of double truncated searches.

begin 
  ctx_ddl.create_preference('STEM_FUZZY_PREF', 'BASIC_WORDLIST'); 
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_MATCH','ENGLISH');
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_SCORE','0');
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_NUMRESULTS','5000');
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','SUBSTRING_INDEX','TRUE');
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','STEMMER','ENGLISH');
end; 

To create the index in SQL, issue the following statement:

create index fuzzy_stem_subst_idx on mytable ( docs ) 
  indextype is ctxsys.context 
  parameters ('Wordlist STEM_FUZZY_PREF'); 

Go to previous page Go to beginning of chapter Go to next page
Oracle
Copyright © 1996-2000, Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index