Managing DML Operations for a CONTEXT Index

5.7 Managing DML Operations for a CONTEXT Index

DML operations refer to when documents are inserted, updated, or deleted from the base table.

This section describes how you can view, synchronize, and optimize the Oracle Text CONTEXT index for DML operations. This section contains the following topics:

5.7.1 Viewing Pending DML Operations

When you insert, update, or delete documents in the base table, their ROWIDs are held in a DML queue until you synchronize the index.

You can view the DML queue by querying index tables, as follows:

When the index is created using the default fast_dml option and when the COMPATIBLE database parameter is set to a value lower than 20.0, the CTXSYS.DR$PENDING table keeps track of pending DMLs. You can query pending insert and update operations with the CTX_PENDING and CTX_USER_PENDING views.
When the index is created using the default fast_dml option and when the COMPATIBLE database parameter is set to 20.0 or higher, the DR$INDEX_NAME$C table stores information on ROWIDs that are waiting for synchronization into the index.

The indexes that are set to MAINTENANCE AUTO (automatic maintenance) or SYNC EVERY are automatically synchronized, however you can periodically examine the pending DML tables to determine whether a synchronization call is failing. For example, if your query results appear incorrect or outdated, then you can check if the documents have been synchronized and accordingly run a manual SYNC_INDEX call if required.

For example, to view pending DML operations on your indexes, enter the following command:

SELECT COUNT(*) FROM myschema.dr$myindex$c;

The output appears as follows:

COUNT(*)
----------
1

To retrieve ROWIDs of all unsynchronized changes from the $C table, enter the following command:

SELECT dml_rid FROM myschema.dr$myindex$c;

The output appears as follows:

DML_RID
------------------
AAAVP9AAMAAAAKGAAD

You can run CTX_DDL.SYNC_INDEX to synchronize your index, and then check if the $C table has been cleared:

EXEC CTX_DDL.SYNC_INDEX('myschema.myindex');

SELECT COUNT(*) FROM myschema.dr$myindex$c;

COUNT(*)
----------
0

5.7.2 Synchronizing the Index

When you synchronize the index, you process all pending updates and inserts to the base table. You can do this in PL/SQL with the CTX_DDL.SYNC_INDEX procedure. You can also control the duration and locking behavior for index synchronization with the CTX_DDL.SYNC_INDEX procedure.

Synchronizing the Index with SYNC_INDEX

The following example synchronizes the index with 2 megabytes of memory:

begin

ctx_ddl.sync_index('myindex', '2M');

end;

Maxtime Parameter for SYNC_INDEX

The SYNC_INDEX procedure includes a maxtime parameter that, like OPTIMIZE_INDEX, indicates a suggested time limit in minutes for the operation. The SYNC_INDEX procedure processes as many documents in the queue as possible within the given time limit.

NULL maxtime is equivalent to CTX_DDL.MAXTIME_UNLIMITED.
The time limit is approximate. The actual time may be less than, or greater than, what you specify.
The ALTER INDEX... sync command has no changes because it is deprecated.
The maxtime parameter is ignored when SYNC_INDEX is invoked without an index name.
The maxtime parameter cannot be communicated for automatic synchronizations (for example, sync on commit or sync every).

Locking Parameter for SYNC_INDEX

The locking parameter of SYNC_INDEX enables you to configure how the synchronization works when another synchronization is already running on the index.

The locking parameter is ignored when SYNC_INDEX is invoked without an index name.
The locking parameter cannot be communicated for automatic synchronizations (that is, sync on commit or sync every).
When the locking mode is LOCK_WAIT, the mode waits forever and ignores the maxtime setting if it cannot get a lock.

The options are as follows:

Option	Description
`CTX_DDL.LOCK_WAIT`	If another `SYNC_INDEX` is running, wait until the running synchronization is complete, and then begin the new synchronization.
`CTX_DDL.LOCK_NOWAIT`	If another `SYNC_INDEX` is running, immediately return without error.
`CTX_DDL.LOCK_NOWAIT_ERROR`	If another `SYNC_INDEX` is running, immediately generate an error (DRG-51313: timeout while waiting for inserts, updates, or deletes or optimize lock).

Note:

Starting with Oracle Database 12c Release 2 (12.2.0.1), you automatically merge rows from STAGE_ITAB back to the $I table by using SYNC_INDEX. This merging of rows happens when the number of rows in STAGE_ITAB ($G) exceeds the STAGE_ITAB_MAX_ROWS parameter (10K by default). Therefore, you do not have to run merge optimization explicitly or schedule an auto optimize job.

5.7.3 Optimizing the Index

The CONTEXT index is an inverted index where each word contains the list of documents that contain that word. For example, after a single initial indexing operation, the word DOG might have the following entry:

DOG DOC1 DOC3 DOC5

Frequent index synchronization ultimately causes fragmentation of your CONTEXT index. Index fragmentation can adversely affect query response time. Therefore, to reduce fragmentation and index size and to ensure optimal query performance, allow time to optimize your CONTEXT index.

To schedule an auto optimize job, you must explicitly set STAGE_ITAB_MAX_ROWS to 0 to disable the automatic merging that now happens with SYNC_INDEX.

To optimize an index, Oracle recommends that you use CTX_DDL.OPTIMIZE_INDEX. To understand index optimization, you must understand the structure of the index and what happens when it is synchronized. This section contains the following topics:

5.7.3.1 Index Fragmentation

When you add new documents to the base table, the index is synchronized by adding new rows. For example, if you add the DOC 7 document with the word dog and synchronize the index, you now have:

DOG DOC1 DOC3 DOC5
DOG DOC7

Subsequent inserts, updates, or deletes also create new rows, as follows:

DOG DOC1 DOC3 DOC5
DOG DOC7
DOG DOC9
DOG DOC11

Index fragmentation occurs when you add new documents and synchronize the index. In particular, background inserts, updates, or deletes, which synchronize the index frequently, generally produce more fragmentation than batch mode synchronization.

When you perform batch processing less frequently, you reduce fragmentation because you produce longer document lists with a reduced number of rows in the index.

You can reduce index fragmentation by optimizing the index in either FULL or FAST mode with CTX_DDL.OPTIMIZE_INDEX.

5.7.3.2 Document Invalidation and Garbage Collection

When you remove documents from the base table, Oracle Text marks the document as removed but does not immediately alter the index.

Because the old information takes up space and can cause extra overhead at query time, you must remove the old information from the index by optimizing it in FULL mode. This process is called garbage collection. Optimizing in FULL mode for garbage collection is necessary when you perform frequent updates or deletes to the base table.

5.7.3.3 Single Token Optimization

In addition to optimizing the entire index, you can optimize single tokens. You can use token mode to optimize index tokens that are frequently searched, without spending time on optimizing tokens that are rarely referenced.

For example, you can specify that only the token DOG be optimized in the index, if you know that this token is updated and queried frequently.

An optimized token can improve query response time for the token.

To optimize an index in token mode, use CTX_DDL.OPTIMIZE_INDEX.

5.7.3.4 Viewing Index Fragmentation and Garbage Data

With the CTX_REPORT.INDEX_STATS procedure, you can create a statistical report on your index. The report includes information on optimal row fragmentation, a list of most fragmented tokens, and the amount of garbage data in your index. Although this report might take a long time to run for large indexes, it can help you decide whether to optimize your index.