SQL Statements and Operators, 5 of 8

CREATE INDEX

This section describes the CREATE INDEX statement as it pertains to creating a Text domain index.

For a complete description of the CREATE INDEX statement, see Oracle9i SQL Reference.

Purpose

Use CREATE INDEX to create an Oracle Text index. An Oracle Text index is an Oracle domain index of type context, ctxcat, or ctxrule.

You must create an appropriate Oracle Text index to issue CONTAINS, CATSEARCH, or MATCHES queries.

You can create the following types of Oracle Text indexes:

CONTEXT index. This is an index on one text column. You query this index with the CONTAINS operator in the WHERE clause of a SELECT statement. See Syntax for CONTEXT Indextype.
CTXCAT index. This is a combined index on a text column and one or more other columns.You query this index with the CATSEARCH operator in the WHERE clause of a SELECT statement. This type of index is optimized for mixed queries. See Syntax for CTXCAT Indextype.
CTXRULE index. This is an index on a column containing a set of queries. You query this index with the MATCHES operator in the WHERE clause of a SELECT statement. See Syntax for CTXRULE Indextype.

Required Privileges

You do not need the CTXAPP role to create an Oracle Text index. If you have Oracle grants to create a b-tree index on the text column, you have sufficient permission to create a text index. The issuing owner, table owner, and index owner can all be different users, which is consistent with Oracle standards for creating regular B-tree indexes.

Syntax for CONTEXT Indextype

CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS

ctxsys.context
LOCAL [(PARTITION [partition] [PARAMETERS('paramstring')]


[, PARTITION [partition] [PARAMETERS('paramstring')]])]


[PARAMETERS(paramstring)] [PARALLEL n];

[schema.]index

Specify the name of the Text index to create.

[schema.]table(column)

Specify the name of the table and column to index.

Your table can optionally contain a primary key if you prefer to identify your rows as such when you use procedures in CTX_DOC. When your table has no primary key, document services identifies your documents by ROWID.

The column you specify must be one of the following types: CHAR, VARCHAR, VARCHAR2, BLOB, CLOB, BFILE, or XMLType.

DATE, NUMBER, and nested table columns cannot be indexed. Object columns also cannot be indexed, but their attributes can be, provided they are atomic data types.

Indexes on multiple columns are not supported with the context index type. You must specify only one column in the column list.

Note:

With the ctxcat indextype, you can create indexes on text and structured columns. See Syntax for CTXCAT Indextype in this chapter.

LOCAL [(PARTITION [partition] [PARAMETERS('paramstring')]

Specify LOCAL to create a local partitioned context index on a partitioned table. The partitioned table must be partitioned by range. Hash, composite and list partitions are not supported.

You can specify the list of index partition names with partition. If you do not specify a partition name, the system assigns one. The order of the index partition list must correspond to the table partition by order.

The PARAMETERS clause associated with each partition specifies the parameters string specific to that partition. You can only specify memory and storage for each index partition.

You can query the views CTX_INDEX_PARTITIONS or CTX_USER_INDEX_PARTITIONS to find out index partition information, such as index partition name, and index partition status.

Query Performance Limitation with Partitioned Index

For optimal performance when querying a partitioned index with an ORDER BY SCORE clause, query the partition. If you query the entire table and use an ORDER BY SCORE clause, the query might not perform optimally unless you include a range predicate that can limit the query to a single partition.

PARALLEL n

Optionally specify with n the parallel degree for parallel indexing. You must have a partitioned text table to use parallel indexing with Oracle Text. Each parallel indexing process works on its own partition, so n must not exceed the number of partitions.

The initialization parameter JOB_QUEUE_PROCESSES must also be set in the initialization parameter file to use parallel indexing. The parallel degree n you specify must be less than the value of JOB_QUEUE_PROCESSES.

A complete example of how to create an index in parallel is given in the Examples section.

See Also:

Oracle9i Reference and Oracle9i Administrator's Guide for more information about these initialization parameters.

PARAMETERS(paramstring)

Optionally specify indexing parameters in paramstring. You can specify preferences owned by another user using the user.preference notation.

The syntax for paramstring is as follows:

paramstring = 

'[datastore datastore_pref] 
[filter filter_pref] 
[charset column charset_column_name]  
[format column format_column_name]

[lexer lexer_pref]
[language column language_column_name] 

[wordlist wordlist_pref] 
[storage storage_pref] 
[stoplist stoplist] 
[section group section_group]
[memory memsize]
[populate | nopopulate]'

You create datastore, filter, lexer, wordlist, and storage preferences with CTX_DDL.CREATE_PREFERENCE and then specify them in the paramstring.

Note:

When you specify no paramstring, Oracle uses the system defaults.

For more information about these defaults, see "Default Index Parameters" in Chapter 2.

datastore datastore_pref

Specify the name of your datastore preference. Use the datastore preference to specify where your text is stored.See Datastore Types in Chapter 2.

filter filter_pref

Specify the name of your filter preference. Use the filter preference to specify how to filter formatted documents to plain text or HTML. See Filter Types in Chapter 2.

charset column charset_column_name

Specify the name of the character set column. This column must be in the same table as the text column, and it must be of type CHAR, VARCHAR, or VARCHAR2. Use this column to specify the document character set for conversion to the database character set. The value is case insensitive. You must specify an NLS character set string such as JA16EUC.

When the document is plain text or HTML, the INSO_FILTER object and CHARSET filter object use this column to convert the document character set to the database character set for indexing.

You use this column when you have plain text or HTML documents with different character sets or in a character set different from the database character set.

Note:

Documents are not marked for re-indexing when only the charset column changes. The indexed column must be updated to flag the re-index.

format column format_column_name

Specify the name of the format column. The format column must be in the same table as the text column and it must be CHAR, VARCHAR, or VARCHAR2 type.

The INSO_FILTER uses the format column when filtering documents. Use this column with heterogeneous document sets to optionally bypass INSO filtering for plain text or HTML documents.

In the format column, you can specify one of the following

TEXT
BINARY
IGNORE

TEXT indicates that the document is either plain text or HTML. When TEXT is specified the document is not filtered, but might be character set converted.

BINARY indicates that the document is a format supported by the INSO_FILTER object other than plain text or HTML, such as PDF. BINARY is the default if the format column entry cannot be mapped.

IGNORE indicates that the row is to be ignored during indexing. Use this value when you need to bypass rows that contain data incompatible with text indexing such as image data.

Note:

Documents are not marked for re-indexing when only the format column changes. The indexed column must be updated to flag the re-index.

lexer lexer_pref

Specify the name of your lexer or multi-lexer preference. Use the lexer preference to identify the language of your text and how text is tokenized for indexing. See Lexer Types in Chapter 2.

language column language_column_name

Specify the name of the language column when using a multi-lexer preference. See MULTI_LEXER in Chapter 2.

This column must exist in the base table. It cannot be the same column as the indexed column. Only the first 30 bytes of the language column is examined for language identification.

Note:

Documents are not marked for re-indexing when only the language column changes. The indexed column must be updated to flag the re-index.

wordlist wordlist_pref

Specify the name of your wordlist preference. Use the wordlist preference to enable features such as fuzzy, stemming, and prefix indexing for better wildcard searching. See Wordlist Type in Chapter 2.

storage storage_pref

Specify the name of your storage preference for the Text index. Use the storage preference to specify how the index tables are stored. See Storage Types in Chapter 2.

stoplist stoplist

Specify the name of your stoplist. Use stoplist to identify words that are not to be indexed. See CTX_DDL.CREATE_STOPLIST in Chapter 6.

section group section_group

Specify the name of your section group. Use section groups to create searchable sections in structured documents. See CTX_DDL.CREATE_SECTION_GROUP in Chapter 6.

memory memsize

Specify the amount of run-time memory to use for indexing. The syntax for memsize is as follows:

memsize = number[M|G|K]

where M stands for megabytes, G stands for gigabytes, and K stands for kilobytes.

The value you specify for memsize must be between 1M and the value of MAX_INDEX_MEMORY in the CTX_PARAMETERS view. To specify a memory size larger than the MAX_INDEX_MEMORY, you must reset this parameter with CTX_ADM.SET_PARAMETER to be larger than or equal to memsize.

The default is the value specified for DEFAULT_INDEX_MEMORY in CTX_PARAMETERS.

The memsize parameter specifies the amount of memory Oracle uses for indexing before flushing the index to disk. Specifying a large amount memory improves indexing performance because there are fewer I/O operations and improves query performance and maintenance since there is less fragmentation.

Specifying smaller amounts of memory increases disk I/O and index fragmentation, but might be useful when run-time memory is scarce.

populate | nopopulate

Specify nopopulate to create an empty index. The default is populate.

Note:

This is the only option whose default value cannot be set with CTX_ADM.SET_PARAMETER.

Empty indexes are populated by updates or inserts to the base table. You might create an empty index when you need to create your index incrementally or to selectively index documents in the base table. You might also create an empty index when you require only theme and Gist output from a document set.

CONTEXT Index Examples

The following sections give examples of creating a CONTEXT index.

Creating CONTEXT Index Using Default Preferences

The following example creates a CONTEXT index called myindex on the docs column in mytable. Default preferences are used.

CREATE INDEX myindex ON mytable(docs) INDEXTYPE IS ctxsys.context;

See Also:

For more information about default settings, see "Default Index Parameters" in Chapter 2.

Also refer to "Indexing" in Chapter 1.

Creating CONTEXT Index with Custom Preferences

The following example creates a CONTEXT index called myindex on the docs column in mytable. The index is created with a custom lexer preference called my_lexer and a custom stoplist called my_stop.

This example also assumes that these preferences were previously created with CTX_DDL.CREATE_PREFERENCE for my_lexer, and CTX_DDL.CREATE_STOPLIST for my_stop. Default preferences are used for the unspecified preferences.

CREATE INDEX myindex ON mytable(docs) INDEXTYPE IS ctxsys.context 
  PARAMETERS('LEXER my_lexer STOPLIST my_stop');

Any user can use any preference. To specify preferences that exist in another user's schema, add the user name to the preference name. The following example assumes that the preferences my_lexer and my_stop exist in the schema that belongs to user kenny:

CREATE INDEX myindex ON mytable(docs) INDEXTYPE IS ctxsys.context 
  PARAMETERS('LEXER kenny.my_lexer STOPLIST kenny.my_stop');

Creating CONTEXT Index with Multi-Lexer Preference

The multi-lexer decides which lexer to use for each row based on a language column. This is a character column in the table which stores the language of the document in the text column. For example, you create the table globaldoc to hold documents of different languages:

CREATE TABLE globaldoc (
   doc_id NUMBER PRIMARY KEY,
   lang VARCHAR2(10),
   text CLOB
);

Assume that global_lexer is a multi-lexer preference you created. To index the global_doc table, you specify the multi-lexer preference and the name of the language column as follows:

CREATE INDEX globalx ON globaldoc(text) INDEXTYPE IS ctxsys.context PARAMETERS 
('LEXER global_lexer LANGUAGE COLUMN lang');

See Also:

For more information about creating multi-lexer preferences, see MULTI_LEXER in Chapter 2.

Parallel Indexing

This example shows how to set up parallel indexing. Do the following:

Set the job queue parameter in the Oracle initialization file to accommodate parallel indexing:

job_queue_processes = 8

The JOB_QUEUE_PROCESSES parameter must be greater than or equal to your parallel degree.

See Also:

For more information about this initialization parameter, see Oracle9i Reference and Oracle9i Administrator's Guide

Create a partitioned text table. The following example creates a hash-partitioned table using values in the pk column as the hash key:
```
CREATE TABLE mypart_tab(pk number primary key, text varchar2(80)) 
PARTITION BY HASH (pk) PARTITIONS 8;
```
Create the index with a parallel degree. This example uses a parallel degree of 3.
```
CREATE INDEX myindex ON mypart_tab(pk) INDEXTYPE IS ctxsys.context PARALLEL 3;
```

Creating a Local Partitioned Index

The following example creates a text table partitioned into three, populates it, and then creates a partitioned index.

PROMPT create partitioned table and populate it

CREATE TABLE part_tab (a int, b varchar2(40)) PARTITION BY RANGE(a)

(partition p_tab1 values less than (10),
 partition p_tab2 values less than (20),
 partition p_tab3 values less than (30));


PROMPT create partitioned index
CREATE INDEX part_idx on part_tab(b) INDEXTYPE IS CTXSYS.CONTEXT

LOCAL (partition p_idx1, partition p_idx2, partition p_idx3);

Viewing Index Errors

After a CREATE INDEX or ALTER INDEX operation, you can view index errors with Oracle Text views. To view errors on your indexes, query the CTX_USER_INDEX_ERRORS view. To view errors on all indexes as CTXSYS, query the CTX_INDEX_ERRORS view.

For example, to view the most recent errors on your indexes, you can issue:

SELECT err_timestamp, err_text FROM ctx_user_index_errors ORDER BY err_timestamp 
DESC;

Deleting Index Errors

To clear the index error view, you can issue:

DELETE FROM ctx_user_index_errors;

Syntax for CTXCAT Indextype

CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS ctxsys.ctxcat 

[PARAMETERS


('[index set index_set]
[lexer lexer_pref]
[storage storage_pref] 
[stoplist stoplist] 
[wordlist wordlist_pref] 
[memory memsize]');

[schema.]table(column)

Specify the name of the table and column to index.

The column you specify when you create a CTXCAT index must be of type CHAR or VARCHAR2. No other types are supported for CTXCAT.

Supported Preferences

index set index_set

Specify the index set preference to create the CTXCAT index. See Creating a CTXCAT Index example in this chapter.

Index Performance and Size Considerations

Although a CTXCAT index offers query performance benefits, creating the index has its costs. The time Oracle takes to create a CTXCAT index depends on its total size, and the total size of a CTXCAT index is directly related to

total text to be indexed
number of component indexes in the index set
number of columns in the base table that make up the component indexes

Having many component indexes in your index set also degrades DML performance since more indexes must be updated.

Because of these added costs in creating a CTXCAT index, carefully consider the query performance benefit each component index gives your application before adding it to your index set.

See Also:

Oracle Text Application Developer's Guide for more information about creating CTXCAT indexes and its benefits.

Other Preferences

When you create an index of type CTXCAT, you can use only the following index preferences in the parameters string:

Table 1-1

Preference Class	Supported Types
Datastore	None.
Filter	None
Lexer	BASIC_LEXER (index_themes attribute not supported) CHINESE_VGRAM_LEXER JAPANESE_VGRAM_LEXER KOREAN_LEXER KOREAN_MORPH_LEXER
Wordlist	BASIC_WORDLIST
Storage	BASIC_STORAGE
Stoplist	Supports single language stoplists only (BASIC_STOPLIST type.)
Section Group	None

Unsupported Preferences and Parameters

When you create a CTXCAT index, you cannot specify datastore, filter and section group preferences. You also cannot specify language, format, and charset columns as with a CONTEXT index.

Creating a CTXCAT Index

This section gives a brief example for creating a CTXCAT index. For a more complete example, see the Oracle Text Application Developer's Guide.

Consider a table called AUCTION with the following schema:

create table auction(

item_id number,
title varchar2(100),
category_id number,
price number,
bid_close date);

Assume that queries on the table involve a mandatory text query clause and optional structured conditions on category_id. Results must be sorted based on bid_close and category_id.

You can create a catalog index to support the different types of structured queries a user might enter. For structured queries, a ctxcat index improves query performance over a context index.

To create the indexes, first create the index set preference then add the required indexes to it:

begin

ctx_ddl.create_index_set('auction_iset');
ctx_ddl.add_index('auction_iset','bid_close');
ctx_ddl.add_index('auction_iset','price, bid_close');

end;

Create the CTXCAT index with CREATE INDEX as follows:

create index auction_titlex on AUCTION(title) indextype is CTXSYS.CTXCAT 
parameters ('index set auction_iset');

Querying a CTXCAT Index

To query the title column for the word pokemon, you can issue regular and mixed queries as follows:

select * from AUCTION where CATSEARCH(title, 'pokemon',NULL)> 0;
select * from AUCTION where CATSEARCH(title, 'pokemon', 'price < 50 order by 
bid_close desc')> 0;

See Also::

Oracle Text Application Developer's Guide for a complete CTXCAT example.

Syntax for CTXRULE Indextype

CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS 

ctxsys.ctxrule 
[PARAMETERS ('[lexer lexer_pref] [storage storage_pref]


[section group section_pref] [wordlist wordlist_pref]'); 


[PARALLEL n];

[schema.]table(column)

Specify the name of the table and column to index.

The column you specify when you create a CTXRULE index must be VARCHAR2 or CLOB. No other types are supported for CTXRULE.

lexer_pref

Specify the lexer preference to be used for processing the queries and the documents to be classified with the MATCHES function. Currently, only the BASIC_LEXER lexer type is supported.

For processing queries, this lexer supports the following operators: ABOUT, STEM, AND, NEAR, NOT, OR, and WITHIN.

The thesaural operators (BT*, NT*, PT, RT, SYN, TR, TRSYS, TT etc.) are supported. However, these operators are expanded using a snapshot of the thesaurus at index time, not when the MATCHES function is issued. This means that if you change your thesaurus after you index, you must re-index your query set

The following operators are not supported: ACCUM, EQIUV, MINUS, WEIGHT, THRESHOLD, WILDCARD, FUZZY, and SOUNDEX.

storage_pref

Specify the storage preference for the index on the queries.Use the storage preference to specify how the index tables are stored. See Storage Types in Chapter 2.

section group

Specify the section group. This parameter does not affect the queries. It applies to sections in the documents to be classified. The following section groups are supported for the CTXRULE indextype:

BASIC_SECTION_GROUP
HTML_SECTION_GROUP
XML_SECTION_GROUP
AUTO_SECTION_GROUP

See Section Group Types in Chapter 2, "Indexing".

wordlist_pref

Specify the wordlist preferences. This is used to enable stemming operations on query terms.See Wordlist Type in Chapter 2.

Example for Creating a CTXRULE Index

See the Oracle Text Application Developer's Guide for a complete example of using the CTXRULE indextype in a document routing application.

CREATE INDEX

Purpose

Required Privileges

Syntax for CONTEXT Indextype

Query Performance Limitation with Partitioned Index

CONTEXT Index Examples

Creating CONTEXT Index Using Default Preferences

Creating CONTEXT Index with Custom Preferences

Creating CONTEXT Index with Multi-Lexer Preference

Parallel Indexing

Creating a Local Partitioned Index

Viewing Index Errors

Deleting Index Errors

Syntax for CTXCAT Indextype

Supported Preferences

Index Performance and Size Considerations

Table 1-1

Unsupported Preferences and Parameters

Creating a CTXCAT Index

Querying a CTXCAT Index

Syntax for CTXRULE Indextype

Example for Creating a CTXRULE Index

Related Topics