Oracle Text Reference Release 9.0.1 Part Number A90121-01 |
|
SQL Statements and Operators, 5 of 8
Note: This section describes the CREATE INDEX statement as it pertains to creating a Text domain index. For a complete description of the CREATE INDEX statement, see Oracle9i SQL Reference. |
Use CREATE INDEX to create an Oracle Text index. An Oracle Text index is an Oracle domain index of type context
, ctxcat,
or ctxrule
.
You must create an appropriate Oracle Text index to issue CONTAINS, CATSEARCH, or MATCHES queries.
You can create the following types of Oracle Text indexes:
You do not need the CTXAPP role to create an Oracle Text index. If you have Oracle grants to create a b-tree index on the text column, you have sufficient permission to create a text index. The issuing owner, table owner, and index owner can all be different users, which is consistent with Oracle standards for creating regular B-tree indexes.
CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE ISctxsys.context LOCAL [(PARTITION [partition] [PARAMETERS('paramstring')][, PARTITION [partition] [PARAMETERS('paramstring')]])][PARAMETERS(paramstring)] [PARALLEL n];
Specify the name of the Text index to create.
Specify the name of the table and column to index.
Your table can optionally contain a primary key if you prefer to identify your rows as such when you use procedures in CTX_DOC. When your table has no primary key, document services identifies your documents by ROWID.
The column you specify must be one of the following types: CHAR, VARCHAR, VARCHAR2, BLOB, CLOB, BFILE, or XMLType.
DATE, NUMBER, and nested table columns cannot be indexed. Object columns also cannot be indexed, but their attributes can be, provided they are atomic data types.
Indexes on multiple columns are not supported with the context
index type. You must specify only one column in the column list.
Note:
With the |
Specify LOCAL to create a local partitioned context index on a partitioned table. The partitioned table must be partitioned by range. Hash, composite and list partitions are not supported.
You can specify the list of index partition names with partition. If you do not specify a partition name, the system assigns one. The order of the index partition list must correspond to the table partition by order.
The PARAMETERS clause associated with each partition specifies the parameters string specific to that partition. You can only specify memory and storage for each index partition.
You can query the views CTX_INDEX_PARTITIONS or CTX_USER_INDEX_PARTITIONS to find out index partition information, such as index partition name, and index partition status.
For optimal performance when querying a partitioned index with an ORDER BY SCORE clause, query the partition. If you query the entire table and use an ORDER BY SCORE clause, the query might not perform optimally unless you include a range predicate that can limit the query to a single partition.
Optionally specify with n the parallel degree for parallel indexing. You must have a partitioned text table to use parallel indexing with Oracle Text. Each parallel indexing process works on its own partition, so n must not exceed the number of partitions.
The initialization parameter JOB_QUEUE_PROCESSES must also be set in the initialization parameter file to use parallel indexing. The parallel degree n you specify must be less than the value of JOB_QUEUE_PROCESSES.
A complete example of how to create an index in parallel is given in the Examples section.
See Also:
Oracle9i Reference and Oracle9i Administrator's Guide for more information about these initialization parameters. |
Optionally specify indexing parameters in paramstring. You can specify preferences owned by another user using the user.preference notation.
The syntax for paramstring
is as follows:
paramstring ='[datastore datastore_pref] [filter filter_pref] [charset column charset_column_name] [format column format_column_name] [lexer lexer_pref] [language column language_column_name] [wordlist wordlist_pref] [storage storage_pref] [stoplist stoplist] [section group section_group] [memory memsize] [populate | nopopulate]'
You create datastore, filter, lexer, wordlist, and storage preferences with CTX_DDL.CREATE_PREFERENCE and then specify them in the paramstring.
Note: When you specify no paramstring, Oracle uses the system defaults. For more information about these defaults, see "Default Index Parameters" in Chapter 2. |
Specify the name of your datastore preference. Use the datastore preference to specify where your text is stored.See Datastore Types in Chapter 2.
Specify the name of your filter preference. Use the filter preference to specify how to filter formatted documents to plain text or HTML. See Filter Types in Chapter 2.
Specify the name of the character set column. This column must be in the same table as the text column, and it must be of type CHAR, VARCHAR, or VARCHAR2. Use this column to specify the document character set for conversion to the database character set. The value is case insensitive. You must specify an NLS character set string such as JA16EUC.
When the document is plain text or HTML, the INSO_FILTER object and CHARSET filter object use this column to convert the document character set to the database character set for indexing.
You use this column when you have plain text or HTML documents with different character sets or in a character set different from the database character set.
Note: Documents are not marked for re-indexing when only the charset column changes. The indexed column must be updated to flag the re-index. |
Specify the name of the format column. The format column must be in the same table as the text column and it must be CHAR, VARCHAR, or VARCHAR2 type.
The INSO_FILTER uses the format column when filtering documents. Use this column with heterogeneous document sets to optionally bypass INSO filtering for plain text or HTML documents.
In the format column, you can specify one of the following
TEXT indicates that the document is either plain text or HTML. When TEXT is specified the document is not filtered, but might be character set converted.
BINARY indicates that the document is a format supported by the INSO_FILTER object other than plain text or HTML, such as PDF. BINARY is the default if the format column entry cannot be mapped.
IGNORE indicates that the row is to be ignored during indexing. Use this value when you need to bypass rows that contain data incompatible with text indexing such as image data.
Note: Documents are not marked for re-indexing when only the format column changes. The indexed column must be updated to flag the re-index. |
Specify the name of your lexer or multi-lexer preference. Use the lexer preference to identify the language of your text and how text is tokenized for indexing. See Lexer Types in Chapter 2.
Specify the name of the language column when using a multi-lexer preference. See MULTI_LEXER in Chapter 2.
This column must exist in the base table. It cannot be the same column as the indexed column. Only the first 30 bytes of the language column is examined for language identification.
Note: Documents are not marked for re-indexing when only the language column changes. The indexed column must be updated to flag the re-index. |
Specify the name of your wordlist preference. Use the wordlist preference to enable features such as fuzzy, stemming, and prefix indexing for better wildcard searching. See Wordlist Type in Chapter 2.
Specify the name of your storage preference for the Text index. Use the storage preference to specify how the index tables are stored. See Storage Types in Chapter 2.
Specify the name of your stoplist. Use stoplist to identify words that are not to be indexed. See CTX_DDL.CREATE_STOPLIST in Chapter 6.
Specify the name of your section group. Use section groups to create searchable sections in structured documents. See CTX_DDL.CREATE_SECTION_GROUP in Chapter 6.
Specify the amount of run-time memory to use for indexing. The syntax for memsize is as follows:
memsize = number[M|G|K]
where M stands for megabytes, G stands for gigabytes, and K stands for kilobytes.
The value you specify for memsize
must be between 1M and the value of MAX_INDEX_MEMORY in the CTX_PARAMETERS view. To specify a memory size larger than the MAX_INDEX_MEMORY, you must reset this parameter with CTX_ADM.SET_PARAMETER to be larger than or equal to memsize
.
The default is the value specified for DEFAULT_INDEX_MEMORY in CTX_PARAMETERS.
The memsize parameter specifies the amount of memory Oracle uses for indexing before flushing the index to disk. Specifying a large amount memory improves indexing performance because there are fewer I/O operations and improves query performance and maintenance since there is less fragmentation.
Specifying smaller amounts of memory increases disk I/O and index fragmentation, but might be useful when run-time memory is scarce.
Specify nopopulate to create an empty index. The default is populate.
Empty indexes are populated by updates or inserts to the base table. You might create an empty index when you need to create your index incrementally or to selectively index documents in the base table. You might also create an empty index when you require only theme and Gist output from a document set.
The following sections give examples of creating a CONTEXT index.
The following example creates a CONTEXT index called myindex
on the docs
column in mytable
. Default preferences are used.
CREATE INDEX myindex ON mytable(docs) INDEXTYPE IS ctxsys.context;
See Also:
For more information about default settings, see "Default Index Parameters" in Chapter 2. Also refer to "Indexing" in Chapter 1. |
The following example creates a CONTEXT index called myindex
on the docs
column in mytable
. The index is created with a custom lexer preference called my_lexer
and a custom stoplist called my_stop
.
This example also assumes that these preferences were previously created with CTX_DDL.CREATE_PREFERENCE for my_lexer
, and CTX_DDL.CREATE_STOPLIST for my_stop
. Default preferences are used for the unspecified preferences.
CREATE INDEX myindex ON mytable(docs) INDEXTYPE IS ctxsys.context PARAMETERS('LEXER my_lexer STOPLIST my_stop');
Any user can use any preference. To specify preferences that exist in another user's schema, add the user name to the preference name. The following example assumes that the preferences my_lexer
and my_stop
exist in the schema that belongs to user kenny
:
CREATE INDEX myindex ON mytable(docs) INDEXTYPE IS ctxsys.context PARAMETERS('LEXER kenny.my_lexer STOPLIST kenny.my_stop');
The multi-lexer decides which lexer to use for each row based on a language column. This is a character column in the table which stores the language of the document in the text column. For example, you create the table globaldoc
to hold documents of different languages:
CREATE TABLE globaldoc ( doc_id NUMBER PRIMARY KEY, lang VARCHAR2(10), text CLOB );
Assume that global_lexer
is a multi-lexer preference you created. To index the global_doc
table, you specify the multi-lexer preference and the name of the language column as follows:
CREATE INDEX globalx ON globaldoc(text) INDEXTYPE IS ctxsys.context PARAMETERS ('LEXER global_lexer LANGUAGE COLUMN lang');
See Also:
For more information about creating multi-lexer preferences, see MULTI_LEXER in Chapter 2. |
This example shows how to set up parallel indexing. Do the following:
job_queue_processes = 8
The JOB_QUEUE_PROCESSES parameter must be greater than or equal to your parallel degree.
See Also:
For more information about this initialization parameter, see Oracle9i Reference and Oracle9i Administrator's Guide |
pk
column as the hash key:
CREATE TABLE mypart_tab(pk number primary key, text varchar2(80)) PARTITION BY HASH (pk) PARTITIONS 8;
CREATE INDEX myindex ON mypart_tab(pk) INDEXTYPE IS ctxsys.context PARALLEL 3;
The following example creates a text table partitioned into three, populates it, and then creates a partitioned index.
PROMPT create partitioned table and populate it CREATE TABLE part_tab (a int, b varchar2(40)) PARTITION BY RANGE(a)(partition p_tab1 values less than (10), partition p_tab2 values less than (20), partition p_tab3 values less than (30));PROMPT create partitioned index CREATE INDEX part_idx on part_tab(b) INDEXTYPE IS CTXSYS.CONTEXTLOCAL (partition p_idx1, partition p_idx2, partition p_idx3);
After a CREATE INDEX or ALTER INDEX operation, you can view index errors with Oracle Text views. To view errors on your indexes, query the CTX_USER_INDEX_ERRORS view. To view errors on all indexes as CTXSYS, query the CTX_INDEX_ERRORS view.
For example, to view the most recent errors on your indexes, you can issue:
SELECT err_timestamp, err_text FROM ctx_user_index_errors ORDER BY err_timestamp DESC;
To clear the index error view, you can issue:
DELETE FROM ctx_user_index_errors;
CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS ctxsys.ctxcat[PARAMETERS('[index set index_set] [lexer lexer_pref] [storage storage_pref] [stoplist stoplist] [wordlist wordlist_pref] [memory memsize]');
Specify the name of the table and column to index.
The column you specify when you create a CTXCAT index must be of type CHAR or VARCHAR2. No other types are supported for CTXCAT.
Specify the index set preference to create the CTXCAT index. See Creating a CTXCAT Index example in this chapter.
Although a CTXCAT index offers query performance benefits, creating the index has its costs. The time Oracle takes to create a CTXCAT index depends on its total size, and the total size of a CTXCAT index is directly related to
Having many component indexes in your index set also degrades DML performance since more indexes must be updated.
Because of these added costs in creating a CTXCAT index, carefully consider the query performance benefit each component index gives your application before adding it to your index set.
See Also:
Oracle Text Application Developer's Guide for more information about creating CTXCAT indexes and its benefits. |
When you create an index of type CTXCAT, you can use only the following index preferences in the parameters
string:
Preference Class | Supported Types |
---|---|
Datastore |
None. |
Filter |
None |
Lexer |
BASIC_LEXER (index_themes attribute not supported) |
Wordlist |
|
Storage |
|
Stoplist |
Supports single language stoplists only (BASIC_STOPLIST type.) |
Section Group |
None |
When you create a CTXCAT index, you cannot specify datastore, filter and section group preferences. You also cannot specify language, format, and charset columns as with a CONTEXT index.
This section gives a brief example for creating a CTXCAT index. For a more complete example, see the Oracle Text Application Developer's Guide.
Consider a table called AUCTION with the following schema:
create table auction(item_id number, title varchar2(100), category_id number, price number, bid_close date);
Assume that queries on the table involve a mandatory text query clause and optional structured conditions on category_id. Results must be sorted based on bid_close and category_id.
You can create a catalog index to support the different types of structured queries a user might enter. For structured queries, a ctxcat index improves query performance over a context index.
To create the indexes, first create the index set preference then add the required indexes to it:
beginctx_ddl.create_index_set('auction_iset'); ctx_ddl.add_index('auction_iset','bid_close'); ctx_ddl.add_index('auction_iset','price, bid_close');end;
Create the CTXCAT index with CREATE INDEX as follows:
create index auction_titlex on AUCTION(title) indextype is CTXSYS.CTXCAT parameters ('index set auction_iset');
To query the title column for the word pokemon, you can issue regular and mixed queries as follows:
select * from AUCTION where CATSEARCH(title, 'pokemon',NULL)> 0; select * from AUCTION where CATSEARCH(title, 'pokemon', 'price < 50 order by bid_close desc')> 0;
CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE ISctxsys.ctxrule [PARAMETERS ('[lexer lexer_pref] [storage storage_pref][section group section_pref] [wordlist wordlist_pref]');[PARALLEL n];
Specify the name of the table and column to index.
The column you specify when you create a CTXRULE index must be VARCHAR2 or CLOB. No other types are supported for CTXRULE.
Specify the lexer preference to be used for processing the queries and the documents to be classified with the MATCHES function. Currently, only the BASIC_LEXER lexer type is supported.
For processing queries, this lexer supports the following operators: ABOUT, STEM, AND, NEAR, NOT, OR, and WITHIN.
The thesaural operators (BT*, NT*, PT, RT, SYN, TR, TRSYS, TT etc.) are supported. However, these operators are expanded using a snapshot of the thesaurus at index time, not when the MATCHES function is issued. This means that if you change your thesaurus after you index, you must re-index your query set
The following operators are not supported: ACCUM, EQIUV, MINUS, WEIGHT, THRESHOLD, WILDCARD, FUZZY, and SOUNDEX.
Specify the storage preference for the index on the queries.Use the storage preference to specify how the index tables are stored. See Storage Types in Chapter 2.
Specify the section group. This parameter does not affect the queries. It applies to sections in the documents to be classified. The following section groups are supported for the CTXRULE indextype:
See Section Group Types in Chapter 2, "Indexing".
Specify the wordlist preferences. This is used to enable stemming operations on query terms.See Wordlist Type in Chapter 2.
See the Oracle Text Application Developer's Guide for a complete example of using the CTXRULE indextype in a document routing application.
CTX_DDL.CREATE_PREFERENCE in Chapter 6, "CTX_DDL Package".
CTX_DDL.CREATE_STOPLIST in Chapter 6, "CTX_DDL Package" .
CTX_DDL.CREATE_SECTION_GROUP in Chapter 6, "CTX_DDL Package" .
|
Copyright © 1996-2001, Oracle Corporation. All Rights Reserved. |
|