SQL Commands, 5 of 6

CREATE INDEX

Note:
This section describes the CREATE INDEX command as it pertains to creating a Text domain index.
For a complete description of the CREATE INDEX command, see Oracle8i SQL Reference.

Purpose

Use CREATE INDEX to create an interMedia Text index. An interMedia Text index is an Oracle domain index of type context created using the extensible indexing framework.

You must create an iMT index to issue CONTAINS queries.

Syntax

CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS 
ctxsys.context [PARAMETERS(paramstring)] [PARALLEL N];

[schema.]index

Specify the name of the Text index to create.

[schema.]table(column)

Specify the name of the table and column to index. The table must have a primary key constraint. This is needed mainly for identifying the documents for document services. Composite primary keys are supported, up to 16 columns.

The column you specify must be one of the following types: CHAR, VARCHAR, VARCHAR2, BLOB, CLOB, or BFILE.

Note:
Indexing the deprecated column types LONG and LONG RAW is supported for the process of migrating Oracle7 systems to Oracle8i.

DATE, NUMBER, and nested table columns cannot be indexed. Object columns also cannot be indexed, but their attributes can be, provided they are atomic data types.

Composite indexes are not supported; you must specify only one column in the column list.

PARALLEL N

Optionally specify with N the parallel degree for parallel indexing. You must have a partitioned text table to use parallel indexing with iMT. Each parallel indexer works on its own partition, so N must not exceed the number of partitions.

The initialization parameters JOB_QUEUE_PROCESSES and JOB_QUEUE_INTERVAL must also be set in the init.ora file to use parallel indexing. The parallel degree N you specify must be less than JOB_QUEUE_PROCESSES.

A complete example of how to create an index in parallel is given in the Examples section.

See Also:
For more information about these initialization parameters, see Oracle8i Reference and Oracle8i Administrator's Guide.

PARAMETERS(paramstring)

Optionally specify indexing parameters in paramstring. You can specify preferences owned by another user using the user.preference notation.

The syntax for paramstring is as follows:

paramstring = '[datastore datastore_pref] 
               [filter filter_pref] 
               [charset column charset_column_name]  
               [format column format_column_name]

               [lexer lexer_pref]
               [language column language_column_name] 
              
               [wordlist wordlist_pref] 
               [storage storage_pref] 
               [stoplist stoplist] 
               [section group section_group]
               [memory memsize]
               [populate | nopopulate]'

You create datastore, filter, lexer, wordlist, and storage preferences with CTX_DDL.CREATE_PREFERENCE.

Note:
When you specify no paramstring, Oracle uses the system defaults.
For more information about these defaults, see "Default Index Parameters" in Chapter 3.

datastore datastore_pref

Specify the name of your datastore preference. See Datastore Objects in Chapter 3.

filter filter_pref

Specify the name of your filter preference. See Filter Objects in Chapter 3.

charset column charset_column_name

Specify the name of the character set column. The charset column must be in the same table as the text column and it must be CHAR, VARCHAR, or VARCHAR2 type. Use this column to specify the document character set for conversion to the database character set. The value is case-insensitive, but must be an NLS character set string such as JA16EUC.

When the document is plain-text or HTML, the INSO_FILTER object and CHARSET filter object use this column to convert the document character set to the database character set for indexing.

You use this column when you have plain-text or HTML documents with different character sets or in a character set different from the database character set.

format column format_column_name

Specify the name of the format column. The format column must be in the same table as the text column and it must be CHAR, VARCHAR, or VARCHAR2 type.

The INSO_FILTER uses the format column when filtering documents. Use this column with heterogeneous document sets to optionally bypass INSO filtering for plain-text or HTML documents.

In the format column, you can specify either TEXT or BINARY. TEXT indicates that the document is either plain-text or HTML. When TEXT is specified the document is not filtered, but might be character-set converted.

BINARY indicates that the document is a format supported by the INSO_FILTER object other than plain-text or HTML, such as PDF. BINARY is the default if the format column entry cannot be mapped.

lexer lexer_pref

Specify the name of your lexer or multi-lexer preference. See Lexer Objects in Chapter 3.

language column language_column_name

Specify the name of the language column when using a multi-lexer preference. See MULTI_LEXER in Chapter 3.

This column must exist in the base table. It cannot be the same column as the indexed column.Only the first 30 bytes of the language column is examined for language identification.

Note:
Documents are not marked for re-indexing when only the language column changes. The indexed column must be updated to flag the re-index.

wordlist wordlist_pref

Specify the name of your wordlist preference. See Wordlist Object in Chapter 3.

storage storage_pref

Specify the name of your storage preference for the Text index. See Storage Objects in Chapter 3.

stoplist stoplist

Specify the name of your stoplist. See CTX_DDL.CREATE_STOPLIST in Chapter 7.

section group section_group

Specify the name of your section group. See CTX_DDL.CREATE_SECTION_GROUP in Chapter 7.

memory memsize

Specify the amount of run-time memory to use for indexing. The syntax for memsize is as follows:

memsize = number[M|G|K]

where M stands for megabytes, G stands for gigabytes, and K stands for kilobytes.

The value for memsize must be between 1M and the value specified for max_index_memory in the CTX_PARAMETERS view. The default is the value specified for default_index_memory in CTX_PARAMETERS.

The memsize parameter specifies the amount of memory Oracle uses for indexing before flushing the index to disk. Specifying a large amount memory improves indexing performance since there is less I/O and improves query performance and maintenance since there is less fragmentation.

Specifying smaller amounts of memory increases disk I/O and index fragmentation, but might be useful if you want to track indexing progress or when run-time memory is scarce.

populate | nopopulate

Specify nopopulate to create an empty index. The default is populate.

Note:
This is the only option whose default value cannot be set with CTX_ADM.SET_PARAMETER.

Empty indexes are populated by updates or inserts to the base table. You might create an empty index when you need to create your index incrementally or to selectively index documents in the base table. You might also create an empty index when you require only theme and Gist output from a document set.

Examples

Creating Index Using Default Preferences

The following example creates a Text index called myindex on the docs column in mytable. Default preferences are used.

create index myindex on mytable(docs) indextype is ctxsys.context;

See Also:
For more information about default settings, see "Default Index Parameters" in Chapter 3.
Also refer to "Indexing" in Chapter 1.

Creating Index with Custom Preferences

The following example creates a Text index called myindex on the docs column in mytable. The index is created with a custom lexer preference called my_lexer and a custom stoplist called my_stop.

This example also assumes that these preferences were previously created with CTX_DDLCREATE_PREFERENCE for my_lexer, and CTX_DDL.CREATE_STOPLIST for my_stop. Default preferences are used for the unspecified preferences.

create index myindex on mytable(docs) indextype is ctxsys.context 
  parameters('lexer my_lexer stoplist my_stop');

Any user can use any preference. To specify preferences that exist in another user's schema, add the user name to the preference name. The following example assumes that the preferences my_lexer and my_stop exist in the schema that belongs to user kenny:

create index myindex on mytable(docs) indextype is ctxsys.context 
  parameters('lexer kenny.my_lexer stoplist kenny.my_stop');

Creating Index with Multi-Lexer Preference

The multi-lexer decides which lexer to use for each row based on a language column. This is a character column in the table which stores the language of the document in the text column. For example, you create the table globaldoc to hold documents of different languages:

create table globaldoc (
   doc_id number primary key,
   lang varchar2(10),
   text clob
);

Assume that global_lexer is a multi-lexer preference you created. To index the global_doc table, you specify the multi-lexer preference and the name of the language column as follows:

create index globalx on globaldoc(text) indextype is ctxsys.context parameters 
('lexer global_lexer language column lang');

See Also:
For more information about creating multi-lexer preferences, see MULTI_LEXER in Chapter 3.

Parallel Indexing

This example shows how to set up parallel indexing. Do the following:

Set job queue parameters in the Oracle initialization file to accommodate immediate parallel indexing:
```
job_queue_processes = 8   
job_queue_interval = 4
```

The JOB_QUEUE_PROCESSES parameter must be greater than or equal to your parallel degree.

This example starts the indexing job within four seconds of issuing the CREATE INDEX command. If you are not worried about the immediacy of the indexing operation, you can set a longer JOB_QUEUE_INTERVAL in seconds.

See Also:
For more information about these initialization parameters, see Oracle8i Reference and Oracle8i Administrator's Guide

Create a partitioned text table. The following example creates a hash-partitioned table using values in the pk column as the hash key:
```
create table mypart_tab(pk number primary key, text varchar2(80)) 
PARTITION BY HASH (pk) PARTITIONS 8;
```
Create the index with a parallel degree. This example uses a parallel degree of 3.
```
create index myindex on mypart_tab(pk) indextype is ctxsys.context parallel 3;
```

Notes

The issuing user does not need the CTXAPP role to create an index. If the user has Oracle grants to create a b-tree index on the column, then this user has sufficient permission to create a Text index. The issuing owner, table owner, and index owner can all be different users, which is the standard behavior for regular b-tree indexes.

Index Errors

You can view index errors with iMT views. View errors on your indexes with CTX_USER_INDEX_ERRORS. View errors on all indexes as CTXSYS with the CTX_INDEX_ERRORS.

For example to view the most recent errors on your indexes, you can issue:

select err_timestamp, err_text from CTX_USER_INDEX_ERRORS order by err_timestamp 
desc;

To clear the view of errors, you can issue:

delete from CTX_USER_INDEX_ERRORS;