Oracle Text Application Developer's Guide Release 9.0.1 Part Number A90122-01 |
|
Indexing, 6 of 6
DML operations to the base table refer to when documents are inserted, updated or deleted from the base table. This section describes how you can monitor, synchronize, and optimize the Oracle Text CONTEXT index when DML operations occur.
Note: CTXCAT indexes are transactional and thus updated immediately when there is an update to the base table. Manual synchronization as described in this section is not necessary for a CTXCAT index. |
When documents in the base table are inserted, updated, or deleted, their ROWIDs are held in a DML queue until you synchronize the index. You can view this queue with the CTX_USER_PENDING view.
For example, to view pending DML on all your indexes, issue the following statement:
SELECT pnd_index_name, pnd_rowid, to_char(pnd_timestamp, 'dd-mon-yyyy hh24:mi:ss') timestamp FROM ctx_user_pending;
This statement gives output in the form:
PND_INDEX_NAME PND_ROWID TIMESTAMP ------------------------------ ------------------ -------------------- MYINDEX AAADXnAABAAAS3SAAC 06-oct-1999 15:56:50
Synchronizing the index involves processing all pending updates, inserts, and deletes to the base table. You can do this in PL/SQL with the CTX_DDL.SYNC_INDEX procedure.
The following example synchronizes the index with 2 megabytes of memory:
beginctx_ddl.sync_index('myindex', '2M');end;
You can set CTX_DDL.SYNC_INDEX to run automatically at regular intervals using the DBMS_JOB.SUBMIT procedure. Oracle Text includes a SQL script you can use to do this. The location of this script is:
$ORACLE_HOME/ctx/sample/script/drjobdml.sql
To use this script, you must be the index owner and you must have execute privileges on the CTX_DDL package. You must also set the job_queue_processes
parameter in your Oracle initialization file.
For example, to set the index synchronization to run every 360 minutes on myindex, you can issue the following in SQL*Plus:
SQL> @drjobdml myindex 360
Frequent index synchronization can fragment your CONTEXT index. Index fragmentation can adversely affect query response time. You can optimize your CONTEXT index to reduce fragmentation and index size and so improve query performance.
To understand index optimization, you must understand the structure of the index and what happens when it is synchronized.
The CONTEXT index is an inverted index where each word contains the list of documents that contain that word. For example, after a single initial indexing operation, the word DOG might have an entry as follows:
DOG DOC1 DOC3 DOC5
When new documents are added to the base table, the index is synchronized by adding new rows. Thus if you add a new document (DOC 7) with the word dog to the base table and synchronize the index, you now have:
DOG DOC1 DOC3 DOC5 DOG DOC7
Subsequent DML will also create new rows:
DOG DOC1 DOC3 DOC5 DOG DOC7 DOG DOC9 DOG DOC11
Adding new documents and synchronizing the index causes index fragmentation. In particular, background DML which synchronizes the index frequently generally produces more fragmentation than synchronizing in batch.
Less frequent batch processing results in longer document lists, reducing the number of rows in the index and hence reducing fragmentation.
You can reduce index fragmentation by optimizing the index in either FULL or FAST mode with CTX_DDL.OPTIMIZE_INDEX.
When documents are removed from the base table, Oracle Text marks the document as removed but does not immediately alter the index.
Because the old information takes up space and can cause extra overhead at query time, you must remove the old information from the index by optimizing it in FULL mode. This is called garbage collection.
Optimizing in FULL mode for garbage collection is necessary when you have frequent updates or deletes to the base table.
In addition to optimizing the entire index, you can optimize single tokens. You can use token mode to optimize index tokens that are frequently searched, without spending time on optimizing tokens that are rarely referenced.
For example, you can specify that only the token DOG be optimized in the index, if you know that this token is updated and queried frequently.
An optimized token can improve query response time for the token.
To optimize an index in token mode, you can use CTX_DDL.OPTIMIZE_INDEX.
To optimize an index, Oracle recommends that you use CTX_DDL.OPTIMIZE_INDEX.
|
Copyright © 1996-2001, Oracle Corporation. All Rights Reserved. |
|