Oracle8i interMedia Text Reference Release 8.1.5 A67843-01 |
|
This chapter describes the SQL commands you use for creating and managing Text indexes and performing Text queries.
The following commands are described in this chapter:
Note: This section describes the ALTER INDEX command as it pertains to managing a Text domain index. For a complete description of the ALTER INDEX command, see Oracle8i SQL Reference. |
Use ALTER INDEX to perform the following maintenance tasks for a Text index:
The following syntax is used to rename an index:
ALTER INDEX [schema.]index_name RENAME to new_index_name ;
Specify the name of the index to be renamed.
Specify the new name for schema.index. The new_index_name parameter can be no more than 25 characters. If you specify a name longer than 25 characters, Oracle returns an error and the renamed index is no longer valid.
The following syntax is used to rebuild the index, resume a failed operation, perform batch DML, add stopwords to index, or optimize the index:
ALTER INDEX [schema.]index REBUILD [online] [parameters (paramstring)];
Optionally specify the online parameter for non-blocking operation, which allows the index to be queried during an ALTER INDEX operation.
Optionally specify a paramstring. If you do not specify paramstring, Oracle rebuilds the index with existing preference settings.
The syntax for paramstring is as follows:
paramstring = 'replace [datastore datastore_pref] [filter filter_pref] [lexer lexer_pref] [wordlist wordlist_pref] [storage storage_pref] [stoplist stoplist] [section group section_group] [memory memsize] | resume [memory memsize] | optimize [fast | full [maxtime (memsize | unlimited)] | sync [memory memsize] | add stopword word'
Rebuilds an index. You can optionally specify preferences, your own or pre-defined.
See Also:
For more information about creating and setting preferences, including information about pre-defined preferences, see Chapter 3, "Indexing". |
Resumes a failed index operation. You can optionally specify the amount of memory to use with memsize.
Optimizes the index. Specify either fast or full optimization.
When you optimize in fast mode, Oracle works on the entire index, compacting fragmented rows. However, in fast mode, old data is not removed.
When you optimize in full mode, you can optimize the whole index or a portion. This method compacts rows and removes old data (garbage collection.)
You use the maxtime parameter to specify in minutes the time Oracle is to spend on the optimization operation. Oracle starts the optimization where it left off and optimizes until complete or until the time limit has been reached, which ever comes first. Specifying a time limit is useful for automating index optimization, where you set Oracle to optimize the index for a specified time on a regular basis.
When you specify maxtime unlimited, the entire index is optimized. This is the default. When you specify 0 for maxtime, Oracle performs minimal optimization.
Synchronizes the index. You can optionally specify the amount of runtime memory to use with memsize.
Dynamically adds a stopword word to the index.
The following command resumes the indexing operation on newsindex with 2 megabytes of memory:
ALTER INDEX newsindex rebuild parameters('resume memory 2M');
The following command rebuilds the index, replacing the stoplist preference with new_stop.
ALTER INDEX newsindex rebuild parameters('replace stoplist new_stop');
The following command optimizes newsindex in fast mode:
ALTER INDEX newsindex rebuild parameters('optimize fast');
To specify an optimization operation to last for three hours (180 minutes), issue the following command:
ALTER INDEX newsindex rebuild parameters('optimize full maxtime 180');
To optimize the entire index without regard to time, issue the following command:
ALTER INDEX newsindex rebuild parameters('optimize full maxtime unlimited');
To optimize the entire index and to allow queries to be issued during the optimization, issue the following command:
ALTER INDEX newsindex rebuild online parameters('optimize full maxtime unlimited');
The following example synchronizes the index with a runtime memory of 2 megabytes:
ALTER INDEX newsindex rebuild PARAMETERS('sync memory 2M');
The memory parameter memsize specifies the amount of memory Oracle uses for the ALTER INDEX operation before flushing the index to disk. Specifying a large amount memory improves indexing performance since there is less I/O and improves query performance and maintenance since there is less fragmentation.
Specifying smaller amounts of memory increases disk I/O and index fragmentation, but might be useful if you want to track indexing progress or when run-time memory is scarce.
CTX_DDL.CREATE_PREFERENCE in Chapter 7.
CTX_DDL.CREATE_STOPLIST in Chapter 7.
CTX_DDL.CREATE_SECTION_GROUP in Chapter 7.
Note: This section describes the DROP INDEX command as it pertains to dropping a Text domain index. For a complete description of the DROP INDEX command, see Oracle8i SQL Reference. |
Use DROP INDEX to drop a specified Text index.
drop index [schema.]index [force];
Optionally force the index to be dropped.
The following example drops an index named doc_index in the current user's database schema.
drop index doc_index;
Use force option when Oracle cannot determine the state of the index, such as when an indexing operation crashes.
Use the CONTAINS operator in the WHERE clause of a SELECT statement to specify the query expression for a Text query.
CONTAINS returns a relevance score for every row selected. You obtain this score with the SCORE operator.
CONTAINS( [schema.]column, text_query VARCHAR2, [label NUMBER]) RETURN NUMBER;
Specify the text column to be searched on. This column must have a Text index associated with it.
Specify the query expression that defines your search in column.
See Also:
For more information about the Text operators you can use in query expressions, see Chapter 4, "Query Operators". |
Optionally specify the label that identifies the score generated by the CONTAINS operator.
For each row selected, CONTAINS returns a number between 0 and 100 that indicates how relevant the document row is to the query. The number 0 means that Oracle found no matches in the row.
The following example searches for all documents in the in the text column that contain the word oracle. The score for each row is selected with the SCORE operator using a label of 1:
SELECT SCORE(1) title from newsindex WHERE CONTAINS(text, 'oracle', 1) > 0;
The CONTAINS operator must always be followed by the > 0 syntax which specifies that the score value calculated by the CONTAINS operator must be greater than zero for the row to be selected.
When the SCORE operator is called (e.g. in a SELECT clause), the operator must reference the label value.
Appendix A, "Working with the Extensible Query Optimizer"
Note: This section describes the CREATE INDEX command as it pertains to creating a Text domain index. For a complete description of the CREATE INDEX command, see Oracle8i SQL Reference. |
Use CREATE INDEX to create an interMedia Text index. An interMedia Text index is an Oracle domain index of type context created using the extensible indexing framework.
CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS ctxsys.context [PARAMETERS(paramstring)];
Specify the name of the Text index to create.
Specify the name of the table and column to index. The table must have a primary key constraint. This is needed mainly for identifying the documents for document services. Composite primary keys are supported, up to 16 columns.
The column you specify must be one of the following types: CHAR, VARCHAR, VARCHAR2, BLOB, CLOB, or BFILE.
Note: Indexing the deprecated column types LONG and LONG RAW is supported for the process of migrating Oracle7 systems to Oracle8i. |
DATE, NUMBER, and nested table columns cannot be indexed. Object columns also cannot be indexed, but their attributes can be, provided they are atomic data types.
Composite indexes are not supported; you must specify only one column in the column list.
Optionally specify indexing parameters in paramstring. You can specify preferences owned by another user using the user.preference notation.
The syntax for paramstring is as follows:
paramstring = '[datastore datastore_pref] [filter filter_pref] [lexer lexer_pref] [wordlist wordlist_pref] [storage storage_pref] [stoplist stoplist] [section group section_group] [memory memsize] [populate | nopopulate]'
You create datastore, filter, lexer, wordlist, and storage preferences with CTX_DDL.CREATE_PREFERENCE.
Note: When you specify no paramstring, Oracle uses the system defaults. For more information about these defaults, see "Default Index Parameters" in Chapter 3. |
Specify the name of your datastore preference. See Datastore Objects in Chapter 3.
Specify the name of your filter preference. See Filter Objects in Chapter 3.
Specify the name of your lexer preference. See Lexer Objects in Chapter 3.
Specify the name of your wordlist preference. See Wordlist Object in Chapter 3.
Specify the name of your storage preference for the Text index. See Storage Objects in Chapter 3.
Specify the name of your stoplist. See CTX_DDL.CREATE_STOPLIST in Chapter 7.
Specify the name of your section group. See CTX_DDL.CREATE_SECTION_GROUP in Chapter 7.
Specify the amount of run-time memory to use for indexing. The syntax for memsize is as follows:
memsize = number[M|G|K]
where M stands for megabytes, G stands for gigabytes, and K stands for kilobytes.
The value for memsize must be between 1M and the value specified for max_index_memory in the CTX_PARAMETERS view. The default is the value specified for default_index_memory in CTX_PARAMETERS.
The memsize parameter specifies the amount of memory Oracle uses for indexing before flushing the index to disk. Specifying a large amount memory improves indexing performance since there is less I/O and improves query performance and maintenance since there is less fragmentation.
Specifying smaller amounts of memory increases disk I/O and index fragmentation, but might be useful if you want to track indexing progress or when run-time memory is scarce.
Specify nopopulate to create an empty index. The default is populate.
Empty indexes are populated by updates or inserts to the base table. You might create an empty index when you require only theme and Gist output from a document set. In addition, you might create an empty index to subsequently index only a subset of documents in the base table.
The following example creates a Text index called newsindex on the news column in mytable. Default preferences are used.
create index newsindex on mytable(news) indextype is ctxsys.context;
See Also: For more information about default settings, see "Default Index Parameters" in Chapter 3.
Also refer to "Indexing Text" in Chapter 1.
The following example creates a Text index called newsindex on the news column in mytable. The index is created with a custom lexer preference called my_lexer and a custom stoplist called my_stop.
This example also assumes that these preferences were previously created with CTX_DDLCREATE_PREFERENCE for my_lexer, and CTX_DDL.CREATE_STOPLIST for my_stop. Default preferences are used for the unspecified preferences.
create index newsindex on mytable(news) indextype is ctxsys.context parameters('lexer MY_LEXER stoplist MY_STOP');
Any user can use any preference. To specify preferences that exist in another user's schema, add the user name to the preference name. The following example assumes that the preferences my_lexer and my_stop exist in user kenny's schema:
create index newsindex on mytable(news) indextype is ctxsys.context parameters('lexer kenny.MY_LEXER stoplist kenny.MY_STOP');
The issuing user does not need the CTXAPP role to create an index. If the user has Oracle grants to create a b-tree index on the column, then this user has sufficient permission to create a Text index. The issuing owner, table owner, and index owner can all be different users, which is the standard behavior for regular b-tree indexes.
CTX_DDL.CREATE_PREFERENCE in Chapter 7.
CTX_DDL.CREATE_STOPLIST in Chapter 7.
CTX_DDL.CREATE_SECTION_GROUP in Chapter 7.
Use the SCORE operator in a SELECT statement to return the score values produced by CONTAINS in a Text query.
SCORE(label NUMBER)
Specify a number to identify the score produced by the query.
The SCORE operator can be used in a SELECT, ORDER BY, or GROUP BY clause.
The following example returns the names of all employees who have listed the words software developer or java in their resume, sorted by the value of the score for the first CONTAINS (software developer) and the second CONTAINS (java).
SELECT employee_name, SCORE(10), SCORE(20) FROM employee_database WHERE CONTAINS (emp.resume, 'software developer', 10) > 0 OR CONTAINS (emp.resume, 'java', 20) > 0 ORDER BY NVL(SCORE(10),0), NVL(SCORE(20),0);
Appendix G, "Scoring Algorithm"