C.5 MULTI_LEXER Example: Indexing a Multilanguage Table
Use the MULTI_LEXER
preference type to index a column containing documents in different languages. For example, use this preference type when your text column stores documents in English, German, and French.
The first step is to create the multilanguage table with a primary key, a text column, and a language column:
create table globaldoc ( doc_id number primary key, lang varchar2(3), text clob );
Assume that the table holds mostly English documents, with some German and Japanese documents. To handle the three languages, you must create three sub-lexers, one for English, one for German, and one for Japanese:
ctx_ddl.create_preference('english_lexer','basic_lexer'); ctx_ddl.set_attribute('english_lexer','index_themes','yes'); ctx_ddl.set_attribute('english_lexer','theme_language','english'); ctx_ddl.create_preference('german_lexer','basic_lexer'); ctx_ddl.set_attribute('german_lexer','composite','german'); ctx_ddl.set_attribute('german_lexer','mixed_case','yes'); ctx_ddl.set_attribute('german_lexer','alternate_spelling','german'); ctx_ddl.create_preference('japanese_lexer','japanese_vgram_lexer');
Create the multi-lexer preference:
ctx_ddl.create_preference('global_lexer', 'multi_lexer');
Because the stored documents are mostly English, make the English lexer the default by using CTX_DDL.ADD_SUB_LEXER:
ctx_ddl.add_sub_lexer('global_lexer','default','english_lexer');
Add the German and Japanese lexers in their respective languages with the CTX_DDL.ADD_SUB_LEXER
procedure. Also assume that the language column is expressed in the standard ISO 639-2 language codes, and add those codes as alternate values.
ctx_ddl.add_sub_lexer('global_lexer','german','german_lexer','ger'); ctx_ddl.add_sub_lexer('global_lexer','japanese','japanese_lexer','jpn');
Create the globalx
index, specifying the multi-lexer preference and the language column in the parameter clause:
create index globalx on globaldoc(text) indextype is ctxsys.context parameters ('lexer global_lexer language column lang');