Oracle Text Reference
Release 9.0.1

Part Number A90121-01
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback

Go to previous page Go to beginning of chapter Go to next page

Indexing, 6 of 11


Wordlist Type

Use the wordlist preference to enable the query options such as stemming, fuzzy matching for your language. You can also use the wordlist preference to enable substring and prefix indexing which improves performance for wildcard queries with CONTAINS and CATSEARCH.

To create a wordlist preference, you must use BASIC_WORDLIST, which is the only type available.

BASIC_WORDLIST

Use BASIC_WORDLIST type to enable stemming and fuzzy matching for Text indexes.

See Also:

For more information about the stem and fuzzy operators, see Chapter 3, "CONTAINS Query Operators"

BASIC_WORDLIST has the following attributes:

Table 2-1
Attribute  Attribute Values 

stemmer 

Specify which language stemmer to use. You can specify one of the following:

NULL (no stemming)

ENGLISH (English inflectional)

DERIVATIONAL (English derivational)

DUTCH

FRENCH

GERMAN

ITALIAN

SPANISH

AUTO (automatic language-detection for stemming) 

fuzzy_match 

Specify which fuzzy matching cluster to use. You can specify one of the following:

GENERIC

JAPANESE_VGRAM

KOREAN

CHINESE_VGRAM

ENGLISH

DUTCH

FRENCH

GERMAN

ITALIAN

SPANISH

OCR

AUTO (automatic language detection for stemming) 

fuzzy_score 

Specify a default lower limit of fuzzy score. Specify a number between 0 and 80. Text with scores below this number is not returned. Default is 60. 

fuzzy_numresults 

Specify the maximum number of fuzzy expansions. Use a number between 0 and 5,000. Default is 100. 

substring_index 

Specify TRUE for Oracle to create a substring index. A substring index improves left-truncated and double-truncated wildcard queries such as %ing or %benz%. Default is FALSE. 

prefix_index 

Specify YES to enable prefix indexing. Prefix indexing improves performance for right truncated wildcard searches such as TO%. Defaults to NO. 

prefix_length_min 

Specify the minimum length of indexed prefixes. Defaults to 1. 

prefix_length_max 

Specify the maximum length of indexed prefixes. Defaults to 64. 

wlidcard_maxterms 

Specify the maximum number of terms in a wildcard expansion. Use a number between 1 and 15,000. Default is 5,000. 

stemmer

Specify the stemmer used for word stemming in Text queries. When you do not specify a value for stemmer, the default is ENGLISH.

Specify AUTO for the system to automatically set the stemming language according to the language setting of the session. When there is no stemmer for a language, the default is NULL. With the NULL stemmer, the stem operator is ignored in queries.

fuzzy_match

Specify which fuzzy matching routines are used for the column. Fuzzy matching is currently supported for English, Japanese, and, to a lesser extent, the Western European languages.


Note:

The fuzzy_match attribute values for Chinese and Korean are dummy attribute values that prevent the English and Japanese fuzzy matching routines from being used on Chinese and Korean text. 


The default for fuzzy_match is GENERIC.

Specify AUTO for the system to automatically set the fuzzy matching language according to language setting of the session.

fuzzy_score

Specify a default lower limit of fuzzy score. Specify a number between 0 and 80. Text with scores below this below this number are not returned. The default is 60.

Fuzzy score is a measure of how close the expanded word is to the query word. The higher the score the better the match. Use this parameter to limit fuzzy expansions to the best matches.

fuzzy_numresults

Specify the maximum number of fuzzy expansions. Use a number between 0 and 5000. The default is 100.

Setting a fuzzy expansion limits the expansion to a specified number of the best matching words.

substring_index

Specify TRUE for Oracle to create a substring index. A substring index improves performance for left-truncated or double-truncated wildcard queries such as %ing or %benz%. The default is false.

Substring indexing has the following impact on indexing and disk resources:

prefix_index

Specify yes to enable prefix indexing. Prefix indexing improves performance for right truncated wildcard searches such as TO%. Defaults to NO.


Note:

Enabling prefix indexing increases index size. 


Prefix indexing chops up tokens into multiple prefixes to store in the $I table.For example, words TOKEN and TOY are normally indexed like this in the $I table:

Token  Type  Information 

TOKEN 

DOCID 1 POS 1 

TOY 

DOCID 1 POS 3 

With prefix indexing, Oracle indexes the prefix substrings of these tokens as follows with a new token type of 6:

Token  Type  Information 

TOKEN 

DOCID 1 POS 1 

TOY 

DOCID 1 POS 3 

DOCID 1 POS 1 POS 3 

TO 

DOCID 1 POS 1 POS 3 

TOK 

DOCID 1 POS 1 

TOKE 

DOCID 1 POS 1 

TOKEN 

DOCID 1 POS 1 

TOY 

DOCID 1 POS 3 

Wildcard searches such as TO% are now faster because Oracle does no expansion of terms and merging of result sets. To obtain the result, Oracle need only examine the (TO,6) row.

prefix_length_min

Specify the minimum length of indexed prefixes. Defaults to 1.

For example, setting prefix_length_min to 3 and prefix_length_max to 5 indexes all prefixes between 3 and 5 characters long.


Note:

A wildcard search whose pattern is below the minimum length or above the maximum length is searched using the slower method of equivalence expansion and merging. 


prefix_length_max

Specify the maximum length of indexed prefixes. Defaults to 64.

For example, setting prefix_length_min to 3 and prefix_length_max to 5 indexes all prefixes between 3 and 5 characters long.


Note:

A wildcard search whose pattern is below the minimum length or above the maximum length is searched using the slower method of equivalence expansion and merging. 


wildcard_maxterms

Specify the maximum number of terms in a wildcard (%) expansion. Use this parameter to keep wildcard query performance within an acceptable limit. Oracle returns an error when the wildcard query expansion exceeds this number.

BASIC_WORDLIST Example

Enabling Fuzzy Matching and Stemming

The following example enables stemming and fuzzy matching for English. The preference STEM_FUZZY_PREF sets the number of expansions to the maximum allowed. This preference also instructs the system to create a substring index to improve the performance of double-truncated searches.

begin 
  ctx_ddl.create_preference('STEM_FUZZY_PREF', 'BASIC_WORDLIST'); 
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_MATCH','ENGLISH');
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_SCORE','0');
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_NUMRESULTS','5000');
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','SUBSTRING_INDEX','TRUE');
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','STEMMER','ENGLISH');
end; 

To create the index in SQL, issue the following statement:

create index fuzzy_stem_subst_idx on mytable ( docs ) 
  indextype is ctxsys.context parameters ('Wordlist STEM_FUZZY_PREF');

Enabling Sub-string and Prefix Indexing

The following example sets the wordlist preference for prefix and sub-string indexing. For prefix indexing, it specifies that Oracle create token prefixes between 3 and 4 characters long:

begin 

ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST'); 
ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','YES');
ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH',3);
ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', 4);
ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
end

Setting Wildcard Expansion Limit

Use the wildcard_maxterms attribute to set the maximum allowed terms in a wildcard expansion.

--- create a sample table
drop table quick ;
create table quick 
  ( 
    quick_id number primary key, 
    text      varchar(80) 
  ); 

--- insert a row with 10 expansions for 'tire%'
insert into quick ( quick_id, text ) 
  values ( 1, 'tire tirea tireb tirec tired tiree tiref tireg tireh tirei 
tirej') ;
commit;

--- create an index using wildcard_maxterms=100
begin 
    Ctx_Ddl.Create_Preference('wildcard_pref', 'BASIC_WORDLIST'); 
    ctx_ddl.set_attribute('wildcard_pref', 'wildcard_maxterms', 100) ;
end; 
/
create index wildcard_idx on quick(text)
    indextype is ctxsys.context 
    parameters ('Wordlist wildcard_pref') ;

--- query on 'tire%' - should work fine
select quick_id from quick
  where contains ( text, 'tire%' ) > 0;

--- now recreate the index with wildcard_maxterms=5

drop index wildcard_idx ;

begin 
    Ctx_Ddl.Drop_Preference('wildcard_pref'); 
    Ctx_Ddl.Create_Preference('wildcard_pref', 'BASIC_WORDLIST'); 
    ctx_ddl.set_attribute('wildcard_pref', 'wildcard_maxterms', 5) ;
end; 
/

create index wildcard_idx on quick(text)
    indextype is ctxsys.context 
    parameters ('Wordlist wildcard_pref') ;

--- query on 'tire%' gives "wildcard query expansion resulted in too many terms"
select quick_id from quick
  where contains ( text, 'tire%' ) > 0;

Go to previous page Go to beginning of chapter Go to next page
Oracle
Copyright © 1996-2001, Oracle Corporation.

All Rights Reserved.
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback