Skip Headers
Oracle® Text Application Developer's Guide
12c Release 1 (12.1)

E17748-07
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

7 Tuning Oracle Text

Oracle Text provides ways to improve your query and indexing performance.

This chapter contains the following topics:

7.1 Optimizing Queries with Statistics

Query optimization with statistics uses the collected statistics on the tables and indexes in a query to select an execution plan that can process the query in the most efficient manner. As a general rule, Oracle recommends that you collect statistics on your base table if you are interested in improving your query performance. Optimizing with statistics enables a more accurate estimation of the selectivity and costs of the CONTAINS predicate and thus a better execution plan.

The optimizer attempts to choose the best execution plan based on the following parameters:

  • The selectivity on the CONTAINS predicate

  • The selectivity of other predicates in the query

  • The CPU and I/O costs of processing the CONTAINS predicates

The following topics discuss how to use statistics with the extensible query optimizer.

Note:

Importing and exporting of statistics on domain indexes, including Oracle Text indexes, is not supported with the DBMS_STATS package. For more information on importing and exporting statistics, see the Oracle Database PL/SQL Packages and Types Reference.

See Also:

Oracle Text Reference for information on the CONTAINS query operator

7.1.1 Collecting Statistics

By default, Oracle Text uses the cost-based optimizer (CBO) to determine the best execution plan for a query.

To enable the optimizer to better estimate costs, you can calculate the statistics on the table you query as follows:

ANALYZE TABLE <table_name> COMPUTE STATISTICS;

Alternatively, you can estimate the statistics on a sample of the table as follows:

ANALYZE TABLE <table_name> ESTIMATE STATISTICS 1000 ROWS;

or

ANALYZE TABLE <table_name> ESTIMATE STATISTICS 50 PERCENT;

You can also collect statistics in parallel with the DBMS_STATS.GATHER_TABLE_STATS procedure:

begin
 
DBMS_STATS.GATHER_TABLE_STATS('owner', 'table_name',
                                       estimate_percent=>50,
                                       block_sample=>TRUE,
                                       degree=>4) ;
 
end  ;

These statements collect statistics on all the objects associated with table_name, including the table columns and any indexes (b-tree, bitmap, or Text domain) associated with the table.

To re-collect the statistics on a table, enter the ANALYZE statement as many times as necessary or use the DBMS_STATS package.

By collecting statistics on the Text domain index, the cost-based optimizer in Oracle Database is able to perform the following tasks:

  • Estimate the selectivity of the CONTAINS predicate

  • Estimate the I/O and CPU costs of using the Text index, that is, the cost of processing the CONTAINS predicate using the domain index

  • Estimate the I/O and CPU costs of each invocation of CONTAINS

Knowing the selectivity of a CONTAINS predicate is useful for queries that contain more than one predicate, such as in structured queries. This way the cost-based optimizer can better decide whether to use the domain index to evaluate CONTAINS or to apply the CONTAINS predicate as a post filter.

See Also:

7.1.1.1 Example for Query Optimization with Statistics

The following structured query provides an example for optimizing statistics.

select score(1) from tab where contains(txt, 'freedom', 1)  > 0 and author = 'King' and year > 1960;

Assume the author column is of type VARCHAR2 and the year column is of type NUMBER. Assume that there is a b-tree index on the author column.

Also assume that the structured author predicate is highly selective with respect to the CONTAINS predicate and the year predicate. That is, the structured predicate (author = 'King') returns a much smaller number of rows with respect to the year and CONTAINS predicates individually, say 5 rows returned versus 1000 and 1500 rows respectively.

In this situation, Oracle Text can execute this query more efficiently by first doing a b-tree index range scan on the structured predicate (author = 'King'), followed by a table access by rowid, and then applying the other two predicates to the rows returned from the b-tree table access.

Note:

When statistics are not collected for a Text index, the cost-based optimizer assumes low selectivity and index costs for the CONTAINS predicate.

7.1.2 Re-Collecting Statistics

After synchronizing your index, you can re-collect statistics on a single index to update the cost estimates.

If your base table has been re-analyzed before the synchronization, it is sufficient to analyze the index after the synchronization without re-analyzing the entire table.

To do so, enter any of the following statements:

ANALYZE INDEX <index_name> COMPUTE STATISTICS;

or

ANALYZE INDEX <index_name> ESTIMATE STATISTICS SAMPLE 50 PERCENT;

7.1.3 Deleting Statistics

You can delete the statistics associated with a table by issuing:

ANALYZE TABLE <table_name> DELETE STATISTICS;

You can delete statistics on one index by issuing the following statement:

ANALYZE INDEX <index_name> DELETE STATISTICS;

7.2 Optimizing Queries for Response Time

By default, Oracle Text optimizes queries for throughput. This results in queries returning all rows in shortest time possible.

However, in many cases, especially in a Web application scenario, queries must be optimized for response time, when you are only interested in obtaining the first few hits of a potentially large hitlist in the shortest time possible.

The following sections describe some ways to optimize CONTAINS queries for response time:

7.2.1 Other Factors that Influence Query Response Time

There are other factors that can influence query response time such as:

7.2.2 Improved Response Time with FIRST_ROWS(n) Hint for ORDER BY Queries

When you need the first rows of an ORDER BY query, Oracle recommends that you use the cost-based FIRST_ROWS(n) hint.

Note:

As the FIRST_ROWS(n) hint is cost-based, Oracle recommends that you collect statistics on your tables before you use this hint.

You use the FIRST_ROWS(n) hint in cases where you want the first number (n) of rows in the shortest possible time. For example, consider the following PL/SQL block that uses a cursor to retrieve the first 10 hits of a query and uses the FIRST_ROWS(n) hint to optimize the response time:

declare
cursor c is 

select /* FIRST_ROWS(10) */ article_id from articles_tab
   where contains(article, 'Omophagia')>0 order by pub_date desc;

begin
for i in c
loop
insert into t_s values(i.pk, i.col);
exit when c%rowcount > 11;
end loop;
end;
/

The cursor c is a SELECT statement that returns the rowids that contain the word omophagia in sorted order. The code loops through the cursor to extract the first 10 rows. These rows are stored in the temporary table t_s.

With the FIRST_ROWS(n) hint, the optimizer instructs the Text index to return rowids in score-sorted order when the cost of returning the top-N hits is lower.

Without the hint, Oracle Database sorts the rowids after the Text index has returned all the rows in unsorted order that satisfy the CONTAINS predicate. Retrieving the entire result set this way takes time.

Because only the first 10 hits are needed in this query, using the hint results in better performance.

Note:

Use the FIRST_ROWS(n) hint when you need only the first few hits of a query. When you need the entire result set, do not use this hint as it might result in poor performance.

7.2.2.1 About the DOMAIN_INDEX_SORT Hint

You can also optimize for response time using the related DOMAIN_INDEX_SORT hint. Like FIRST_ROWS(n), when queries are optimized for response time, Oracle Text returns the first rows in the shortest time possible.

For example, you can use this hint as follows

select /*+ DOMAIN_INDEX_SORT */ pk, score(1), col from ctx_tab 
            where contains(txt_col, 'test', 1) > 0 order by score(1) desc;

However, this hint is only rule-based. This means that Oracle Text always chooses the index which satisfies the ORDER BY clause. This might result in sub-optimal performance for queries in which the CONTAINS clause is very selective. In these cases, Oracle recommends that you use the FIRST_ROWS(n) hint, which is fully cost-based.

7.2.3 Improved Response Time using Local Partitioned CONTEXT Index

Partitioning your data and creating local partitioned indexes can improve your query performance. On a partitioned table, each partition has its own set of index tables. Effectively, there are multiple indexes, but the results from each are combined as necessary to produce the final result set.

You create the CONTEXT index using the LOCAL keyword as follows:

CREATE INDEX index_name ON table_name (column_name) 
INDEXTYPE IS ctxsys.context
PARAMETERS ('...')
LOCAL

With partitioned tables and indexes, you can improve performance of the following types of queries:

7.2.3.1 Range Search on Partition Key Column

This is a query that restricts the search to a particular range of values on a column that is also the partition key. For example, consider a query on a date range:

SELECT storyid FROM storytab WHERE CONTAINS(story, 'oliver')>0 and pub_date BETWEEN '1-OCT-93' AND '1-NOV-93';

If the date range is quite restrictive, it is very likely that the query can be satisfied by only looking in a single partition.

7.2.3.2 ORDER BY Partition Key Column

This is a query that requires only the first n hits, and that the ORDER BY clause names the partition key. Consider an ORDER BY query on a price column to fetch the first 20 hits such as:

SELECT * FROM (
SELECT itemid FROM item_tab WHERE CONTAINS(item_desc, 'cd player')
  >0 ORDER BY price)
  WHERE ROWNUM < 20;

In this example, with the table partitioned by price, the query might only need to get hits from the first partition to satisfy the query.

7.2.4 Improved Response Time with Local Partitioned Index for Order by Score

Using the DOMAIN_INDEX_SORT hint on a local partitioned index might result in poor performance, especially when you order by score. This is because all hits to the query across all partitions must be obtained before the results can be sorted.

You can work around this by using an inline view when you use the DOMAIN_INDEX_SORT hint. Specifically, you can use the DOMAIN_INDEX_SORT hint to improve query performance on a local partitioned index under the following conditions:

  • The text query itself including the order by SCORE() clause is expressed as an in-line view.

  • The text query inside the in-line view contains the DOMAIN_INDEX_SORT hint.

  • The query on the in-line view has ROWNUM predicate limiting number of rows to fetch from the view.

For example, if you have the following text query and local text index created on a partitioned table doc_tab:

     select doc_id, score(1) from doc_tab 
        where contains(doc, 'oracle', 1)>0 
        order by score(1) desc;

and you are only interested in fetching top 20 rows, you can rewrite the query to

     select * from 
          (select /*+ DOMAIN_INDEX_SORT */ doc_id, score(1) from doc_tab 
              where contains(doc, 'oracle', 1)>0 order by score(1) desc) 
      where rownum < 21;

See Also:

7.2.5 Improved Response Time with Query Filter Cache

Oracle Text provides a cache layer called the query filter cache that can be used to cache the query results. The query filter cache is sharable across queries. Thus, the cached query results can be reused by multiple queries, improving the query response time.

The ctxfiltercache operator should be used to specify which query results to cache. The following example uses the ctxfiltercache operator to store the results of the common_predicate query in the cache:

select * from docs where contains(txt, 'ctxfiltercache((common_predicate), FALSE)')>0;

In the following example, the cached results of the common_predicate query are reused by the new_query query, thus improving the query response time.

select * from docs where contains(txt, 'new_query & ctxfiltercache((common_predicate), FALSE)')>0; 

Note:

  • You can specify the size of the query filter cache using the basic storage attribute query_filter_cache_size

  • The view ctx_filter_cache_statistics provides various statistics about the query filter cache

See Also:

Oracle Text Reference for more information about:
  • ctxfiltercache operator

  • query_filter_cache_size basic storage attribute

  • ctx_filter_cache_statistics view

7.2.6 Improved Response Time using BIG_IO Option of CONTEXT Index

Oracle Text provides the BIG_IO option for improving the query performance for the CONTEXT indexes that extensively use IO operations. The query performance improvement is mainly for data stored on rotating disks, and not for data stored on solid state disks.

A CONTEXT index with the BIG_IO option enabled creates token type pairs with one LOB (large object datatype) for each unique token text. Thus, tokens with the same text but different token types correspond to different rows in the $I table.

The indexes with the BIG_IO option enabled should have the token LOBs created as SecureFile LOBs. In the SecureFile lobs, the data is stored sequentially in multiple blocks. This improves the response time of the queries, as the queries can now perform longer sequential reads instead of performing many short reads.

Note:

Using SecureFiles requires the COMPATIBLE setting to be 11.0 or higher. In addition, the LOB must be created on an automatic segment space management (ASSM) tablespace. When migrating from the existing text indexes to SecureFiles, an ASSM tablespace should be used. To help in migrating the existing indexes to SecureFiles lobs, ALTER INDEX REBUILD is extended to provide the replacement of storage preferences that only affect the $I table without performing reindex.

To create a CONTEXT index with the BIG_IO index option, first create a basic storage preference by setting the value of its BIG_IO storage attribute to YES, and then specify this storage preference while creating the CONTEXT index.

The following example creates a basic storage preference mystore and sets the value of its BIG_IO storage attribute to YES:

exec ctx_ddl.create_preference('mystore', 'BASIC_STORAGE');
exec ctx_ddl.set_attribute('mystore', 'BIG_IO', 'YES');

To disable the BIG_IO option, update the existing storage preference (mystore) by setting the value of its BIG_IO storage attribute to NO, and then rebuild the index.

exec ctx_ddl.set_attribute('mystore', 'BIG_IO', 'NO');
alter index idx rebuild('replace storage mystore');

Caution:

Do not use replace metadata operation to disable the BIG_IO index option, as it can leave the index in an inconsistent state.

To enable the BIG_IO option for a partitioned index without rebuilding the index, modify the basic storage preference by setting the value of its BIG_IO storage attribute to YES, replace the global index metadata using ctx_ddl.replace_index_metadata, and then call optimize_index in REBUILD mode for each of the partitions of the partitioned index table.

The following example enables the BIG_IO option for the partitioned index idx:

exec ctx_ddl.set_attribute('mystore', 'BIG_IO', 'YES');
exec ctx_ddl.replace_index_metadata('idx', 'replace storage mystore');
exec ctx_ddl.optimize_index('idx', 'rebuild', part_name=>'part1');

Note:

If a procedure modifies the existing index tables with only the BIG_IO option enabled, then it will not result in reindexing of the data.

Note:

Since the BIG_IO index option performs longer sequential reads, the queries that use the BIG_IO index option require a large program global area (PGA) memory.

7.2.7 Improved Response Time using SEPARATE_OFFSETS Option of CONTEXT Index

Oracle Text provides the SEPARATE_OFFSETS option for improving the query performance for the CONTEXT indexes that extensively use IO operations, and whose queries are mainly single-word or Boolean queries.

The SEPARATE_OFFSETS option creates a different postings list structure for the tokens of type TEXT. Instead of interspersing docids, frequencies, info-length (length of the offsets information), and the offsets in the postings list, the SEPARATE_OFFSETS option stores all the docids and the frequencies together at the beginning of the postings list, and all the info-lengths and the offsets at the end of the postings list. The header at the beginning of the posting contains the information about the boundary points between the docids and the offsets. This separation of the docids and the offsets reduces the time for the queries to read the data, thus improving the query response time.

The performance of the SEPARATE_OFFSETS option is best realized when it is used in conjunction with the BIG_IO option, and is used for the tokens with very long posting.

To create a CONTEXT index with the SEPARATE_OFFSETS index option, first create a basic storage preference by setting the value of its SEPARATE_OFFSETS storage attribute to T, and then specify this storage preference while creating the CONTEXT index.

The following example creates a basic storage preference mystore and sets the value of its SEPARATE_OFFSETS storage attribute to T:

exec ctx_ddl.create_preference('mystore', 'BASIC_STORAGE');
exec ctx_ddl.set_attribute('mystore', 'SEPARATE_OFFSETS', 'T');

To disable the SEPARATE_OFFSETS option, update the existing storage preference (mystore) by setting the value of its SEPARATE_OFFSETS storage attribute to F, and then rebuild the index.

exec ctx_ddl.set_attribute('mystore', 'SEPARATE_OFFSETS', 'F');
alter index idx rebuild('replace storage mystore');

Caution:

Do not use replace metadata operation to disable the SEPARATE_OFFSETS index option, as it can leave the index in an inconsistent state.

To enable the SEPARATE_OFFSETS option for a partitioned index without rebuilding the index, modify the basic storage preference by setting the value of its SEPARATE_OFFSETS storage attribute to T, replace the global index metadata using ctx_ddl.replace_index_metadata, and then call optimize_index in REBUILD mode for each of the partitions of the partitioned index table.

The following example enables the SEPARATE_OFFSETS option for the partitioned index idx:

exec ctx_ddl.set_attribute('mystore', 'SEPARATE_OFFSETS', 'T');
exec ctx_ddl.replace_index_metadata('idx', 'replace storage mystore');
exec ctx_ddl.optimize_index('idx', 'rebuild', part_name=>'part1');

Note:

If a procedure modifies the existing index tables with only the SEPARATE_OFFSETS option enabled, then it will not result in reindexing of the data.

7.2.8 Improved Response Time using STAGE_ITAB Option of CONTEXT Index

Oracle Text provides the STAGE_ITAB option for improving the query performance for the CONTEXT indexes that extensively use DML operations for near real-time indexing.

When the STAGE_ITAB index option is not used, whenever a new document is added to the CONTEXT index, SYNC_INDEX is called to make the documents searchable. This creates new rows in the $I table, thus increasing the fragmentation in the $I table. This leads to the deterioration of the query performance.

When the STAGE_ITAB index option is enabled, the information about the new documents is stored in the $G staging table, and not in the $I table. This ensures that the $I table does not get fragmented, and thus not deteriorating the query performance.

When the STAGE_ITAB index option is enabled, the $H b-tree index is also created on the $G table. The $G table and $H b-tree index are equivalent to the $I table and $X b-tree index.

Use the MERGE optimization mode to optimize the rows present in the $G table and move them to the $I table.

Note:

The $G table is stored in the KEEP pool. You should allocate sufficient KEEP pool memory for the STAGE_ITAB query option to provide improved query performance.

To create a CONTEXT index with the STAGE_ITAB index option, first create a basic storage preference by setting the value of its STAGE_ITAB storage attribute to YES, and then specify this storage preference while creating the CONTEXT index.

The following example creates a basic storage preference mystore and sets the value of its STAGE_ITAB storage attribute to YES:

exec ctx_ddl.create_preference('mystore', 'BASIC_STORAGE');
exec ctx_ddl.set_attribute('mystore', 'STAGE_ITAB', 'YES');

You can also enable the STAGE_ITAB index option for an existing non-partitioned CONTEXT index by using the rebuild option of the ALTER INDEX statement.

alter index IDX rebuild parameters('replace storage mystore');

To disable the STAGE_ITAB option for a non-partitioned CONTEXT index, update the existing storage preference (mystore) by setting the value of its STAGE_ITAB storage attribute to NO, and then rebuild the index.

exec ctx_ddl.set_attribute('mystore', 'STAGE_ITAB', 'NO');
alter index idx rebuild('replace storage mystore');

This operation runs the optimization process using the MERGE optimization mode and then drops the $G table.

The rebuild option of the ALTER INDEX statement does not work with the partitioned CONTEXT index for enabling and disabling the STAGE_ITAB option.

The following example enables the STAGE_ITAB option for the partitioned CONTEXT index idx:

alter index idx parameters('add stage_itab');

The following example disables the STAGE_ITAB option for the partitioned CONTEXT index idx:

alter index idx parameters('remove stage_itab');

Note:

You also need to specify the BASIC_STORAGE preferences g_index_clause and g_table_clause for using the STAGE_ITAB index option for a CONTEXT index. See Oracle Text Reference for more information about BASIC_STORAGE.

7.3 Optimizing Queries for Throughput

Optimizing a query for throughput returns all hits in the shortest time possible. This is the default behavior.

The following sections describe how you can explicitly optimize for throughput.

7.3.1 CHOOSE and ALL ROWS Modes

By default, queries are optimized for throughput under the CHOOSE and ALL_ROWS modes. When queries are optimized for throughput, Oracle Text returns all rows in the shortest time possible.

7.3.2 FIRST_ROWS(n) Mode

In FIRST_ROWS(n) mode, the optimizer in Oracle Database optimizes for fast response time by having the Text domain index return score-sorted rows, if possible. This is the default behavior when you use the FIRST_ROWS(n) hint.

If you want to optimize for better throughput under FIRST_ROWS(n), you can use the DOMAIN_INDEX_NO_SORT hint. Better throughput means you are interested in getting all the rows to a query in the shortest time.

The following example achieves better throughput by not using the Text domain index to return score-sorted rows. Instead, Oracle Text sorts the rows after all the rows that satisfy the CONTAINS predicate are retrieved from the index:

select /*+ FIRST_ROWS(10) DOMAIN_INDEX_NO_SORT */ pk, score(1), col from ctx_tab 
            where contains(txt_col, 'test', 1) > 0 order by score(1) desc;

See Also:

Oracle Database SQL Tuning Guide for more information about the query optimizer and using hints such as FIRST_ROWS(n) and CHOOSE

7.4 Composite Domain Index (CDI) in Oracle Text

The Composite Domain Index feature of the Extensibility Framework in Oracle Database enables structured columns to be indexed by Oracle Text. Therefore, both text and one or more structured criteria can be satisfied by one single Oracle Text index row source. Performance for the following types of query are improved:

  • Text query with structured criteria in the SQL WHERE clause.

  • Text query with structured ORDER BY criteria.

  • Combination of both of the previous two query types.

As with concatenated b-tree indexes or bitmap indexes, applications will experience slow-down in DML performance as the number of FILTER BY and ORDER BY columns increases. Where SCORE-sort push-down is optimized for response time, the structured sort or combination of SCORE and structured sort push-down are also optimized for response time, and not for throughput. However, using DOMAIN_INDEX_SORT or FIRST_ROWS(n) hints to force the sort to be pushed into CDI while fetching the entire hitlist may result in poor query response time.

7.4.1 Performance Tuning with CDI

Support for mapping a FILTER BY column to MDATA enables query performance to be optimized for equality searches by restricting supported functionality of RANGE and LIKE. However, mapping a FILTER BY column to MDATA is not recommended if the FILTER BY column contains sequential values, or has very high cardinality. Doing so can result in a very long and narrow $I table and reduced $X performance. One example of such a sequential column might be one that uses DATE stamp. For such sequential columns, mapping to SDATA is recommended.

The following hints can be used to push or not push the SORT and FILTER BY predicates into the CDI:

  • DOMAIN_INDEX_SORT. The query optimizer will try to push the applicable sorting criteria into the specified composite domain index.

  • DOMAIN_INDEX_NO_SORT. The query optimizer will try NOT to push sorting criteria into the specified composite domain index.

  • DOMAIN_INDEX_FILTER(table name index name). The query optimizer will try to push the applicable FILTER BY predicate(s) into the specified composite domain index.

  • DOMAIN_INDEX_NO_FILTER(table name index name). The query optimizer will not try to push the applicable FILTER BY predicate(s) into the specified composite domain index.

Example 7-1 Performance Tuning a Text Query with CDI Hints

The following example performs an optimized query on the table books.

SELECT bookid, pub_date, source FROM
  (SELECT /*+ domain_index_sort domain_index_filter(books books_ctxcdi) */ bookid, pub_date, source
      FROM books
      WHERE CONTAINS(text, 'aaa',1)>0 AND bookid >= 80
      ORDER BY PUB_DATE desc nulls last, SOURCE asc  nulls last, score(1) desc)
 WHERE rownum < 20;

Note:

The domain_index_filter hint does not force the query optimizer to use CDI. Instead, if the cost-based optimizer chooses to use the CDI, then it should also push the filter predicate into the index. To force the query optimizer to choose CDI index, you additionally need to use the INDEX hint.

7.5 Solving Index and Query Bottlenecks Using Tracing

Oracle Text includes a tracing facility that enables you to identify bottlenecks in indexing and querying.

Oracle Text provides a set of predefined traces. Each trace is identified by a unique number. There is also a symbol in CTX_OUTPUT for this number.

Each trace measures a specific numeric quantity—for instance, the number of $I rows selected during text queries.

Traces are cumulative counters, so usage is as follows:

  1. The user enables a trace.

  2. The user performs one or more operations. Oracle Text measures activities and accumulates the results in the trace.

  3. The user retrieves the trace value, which is the total value across all operations done in step 2.

  4. The user resets the trace to 0.

  5. The user starts over at Step 2.

So, for instance, if in step 2 the user runs two queries, and query 1 selects 15 rows from $I, and query 2 selects 17 rows from $I, then in step 3 the value of the trace would be 32 (15 + 17).

Traces are associated with a session—they can measure operations that take place within a single session, and, conversely, cannot make measurements across sessions.

During parallel sync or optimize, the trace profile will be copied to the slave sessions if and only if tracing is currently enabled. Each slave will accumulate its own traces and implicitly write all trace values to the slave logfile before termination.

7.6 Using Parallel Queries

In general, parallel queries are optimal for DSS, OLAP, or analytical systems with large data collection, multiple CPUs with a low number of concurrent users, or parallelized across Oracle Real Application Clusters (Oracle RAC) nodes.

Oracle Text supports parallel queries as follows:

7.6.1 Parallel Queries on a Local Context Index

Parallel query refers to the parallelized processing of a local CONTEXT index. Based on the parallel degree of the index and various system attributes, Oracle determines the number of parallel query slaves to be spawned to process the index. Each parallel query slave processes one or more index partitions. This is the default query behavior for local indexes created in parallel.

However, for heavily loaded systems with high numbers of concurrent users, query throughput will generally be worse with parallel query because top-N hits can usually be satisfied by the first few partitions, if the query is run serially. For example, typical top-N text queries with an ORDER BY partition key column, such as:

select * from (
        select story_id from stories_tab where contains(...)>0 order by 
publication_date desc)
    where rownum <= 10;

will generally perform worse with a parallel query.

You can disable parallel querying after a parallel index operation with an ALTER INDEX statement as follows:

Alter index <text index name> NOPARALLEL;
Alter index <text index name> PARALLEL 1;

You can also enable or increase the parallel degree by specifying:

Alter index <text index name> paralllel < parallel degree >;

7.6.2 Parallelizing Queries Across Oracle RAC Nodes

Oracle Real Application Clusters (Oracle RAC) provides an excellent solution for improving query throughput. If you can get good performance from Oracle Text with a light query load, then you can expect to get excellent scalability from Oracle RAC as the query load increases.

Further improvements in Oracle Text performance in an Oracle RAC environment may be achieved by physically partitioning the text data and text indexes (using local partitioned indexes), and ensuring that partitions are handled by separate Oracle RAC nodes. This way, you avoid duplication of the cache contents across multiple nodes and, therefore, maximize the benefit of Oracle RAC cache fusion.

In Oracle 10g Release 1, each Oracle Text index partition must be forced into a separate database file when the index is created. This enables the use of the "re-mastering" feature in Oracle RAC to force database file affinity, in which each node concentrates on a particular database file and, therefore, a particular Oracle Text index partition.

In Oracle 10g Release 2 and forward, Oracle supports database object-level affinity, which makes it much easier to allocate index objects ($I and $R tables) to particular nodes.

While Oracle RAC offers solutions for improving query throughput and performance, is not a "magic bullet," and it will not necessarily enable you to continue to get the same performance improvements as you scale up the data volumes. You are more likely to see improvements by increasing the amounts of memory available to the SGA cache, or by partitioning your data in such a way that queries will normally not need to hit all of the partitions of a table in order to provide the required set of query results.

7.7 Tuning Queries with Blocking Operations

Issuing a query with more than one predicate can cause a blocking operation in the execution plan. For example, consider the following mixed query:

select docid from mytab where contains(text, 'oracle', 1) > 0 
  AND colA > 5 
  AND colB > 1 
  AND colC > 3; 

Assume that all predicates are unselective and colA, colB, and colC have bitmap indexes. The cost-based optimizer in Oracle Database chooses the following execution plan:

TABLE ACCESS BY ROWIDS
  BITMAP CONVERSION TO ROWIDS
    BITMAP AND
      BITMAP INDEX COLA_BMX
      BITMAP INDEX COLB_BMX
      BITMAP INDEX COLC_BMX
      BITMAP CONVERSION FROM ROWIDS
        SORT ORDER BY
          DOMAIN INDEX MYINDEX

Because the BITMAP AND is a blocking operation, Oracle Text must temporarily save the rowid and score pairs returned from the Oracle Text domain index before running the BITMAP AND operation.

Oracle Text attempts to save these rowid and score pairs in memory. However, when the size of the result set containing these rowid and score pairs exceeds the SORT_AREA_SIZE initialization parameter, Oracle Text spills these results to temporary segments on disk.

Because saving results to disk causes extra overhead, you can improve performance by increasing the SORT_AREA_SIZE parameter using ALTER SESSION as follows:

alter session set SORT_AREA_SIZE = <new memory size in bytes>;

For example, to set the buffer to approximately 8 megabytes, enter:

alter session set SORT_AREA_SIZE = 8300000;

See Also:

Oracle Database Performance Tuning Guide and Oracle Database Reference for more information on SORT_AREA_SIZE

7.8 Frequently Asked Questions About Query Performance

This section answers some of the frequently asked questions about query performance.

7.8.1 What is Query Performance?

Answer: There are generally two measures of query performance:

  • Response time, the time to get an answer to an individual query, and

  • Throughput, the number of queries that can be run in any time period; for example, queries each second.

These two are related, but are not the same. In a heavily loaded system, you normally want maximum throughput, whereas in a relatively lightly loaded system, you probably want minimum response time. Also, some applications require a query to deliver all its hits to the user, whereas others might only require the first 20 hits from an ordered set. It is important to distinguish between these two scenarios.

7.8.2 What is the fastest type of text query?

Answer: The fastest type of query will meet the following conditions:

  • Single CONTAINS clause

  • No other conditions in the WHERE clause

  • No ORDER BY clause at all

  • Only the first page of results is returned (for example, the first 10 or 20 hits).

7.8.3 Should I collect statistics on my tables?

Answer: Yes. Collecting statistics on your tables enables Oracle Text to do cost-based analysis. This helps Oracle Text choose the most efficient execution plan for your queries.

If your queries are always pure text queries (no structured predicate and no joins), you should delete statistics on your Oracle Text index.

7.8.4 How does the size of my data affect queries?

Answer: The speed at which the text index can deliver ROWIDs is not affected by the actual size of the data. Text query speed will be related to the number of rows that must be fetched from the index table, number of hits requested, number of hits produced by the query, and the presence or absence of sorting.

7.8.5 How does the format of my data affect queries?

Answer: The format of the documents (plain ASCII text, HTML or Microsoft Word) should make no difference to query speed. The documents are filtered to plain text at indexing time, not query time.

The cleanliness of the data will make a difference. Spell-checked and sub-edited text for publication tends to have a much smaller total vocabulary (and therefore size of the index table) than informal text such as e-mails, which will contain many spelling errors and abbreviations. For a given index memory setting, the extra text takes up more memory, which can lead to more fragmented rows than in the cleaner text, which can adversely affect query response time.

7.8.6 What is a functional versus an indexed lookup?

Answer: There are two ways the kernel can query the text index. In the first and most common case, the kernel asks the text index for all the rowids that satisfy a particular text search. These rowids are returned in batches. In the second, the kernel passes individual rowids to the text index, and asks whether that particular rowid satisfies a certain text criterion.

The second is known as a functional lookup, and is most commonly done where there is a very selective structured clause, so that only a few rowids must be checked against the text index. An example of a search where a functional lookup may be used:

SELECT ID, SCORE(1), TEXT FROM MYTABLE
WHERE START_DATE = '21 Oct 1992'         <- highly selective
AND CONTAINS (TEXT, 'commonword') > 0    <- unselective

Functional invocation is also used for text query ordered by structured column (for example date, price) and text query is unselective.

7.8.7 What tables are involved in queries?

Answer: All queries look at the index token table. Its name has the form DR$indexname$I. This contains the list of tokens (column TOKEN_TEXT) and the information about the row and word positions where the token occurs (column TOKEN_INFO).

The row information is stored as internal DOCID values. These must be translated into external ROWID values. The table used for this depends on the type of lookup: For functional lookups, the $K table, DR$indexname$K, is used. This is a simple Index Organized Table (IOT) which contains a row for each DOCID/ROWID pair.

For indexed lookups, the $R table, DR$indexname$R, is used. This holds the complete list of ROWIDs in a BLOB column.

Hence we can easily find out whether a functional or indexed lookup is being used by examining a SQL trace, and looking for the $K or $R tables.

Note:

These internal index tables are subject to change from release to release. Oracle recommends that you do not directly access these tables in your application.

7.8.8 Does sorting the results slow a text-only query?

Answer: Yes, it certainly does.

If there is no sorting, then Oracle Text can return results as it finds them, which is quicker in the common case where the application needs to display only a page of results at a time.

7.8.9 How do I make an ORDER BY score query faster?

Answer: Sorting by relevance (SCORE(n)) can be extremely quick if the FIRST_ROWS(n) hint is used. In this case, Oracle Text performs a high speed internal sort when fetching from the text index tables.

An example of such a query:

              SELECT /*+ FIRST_ROWS(10) */ ID, SCORE(1), TEXT FROM mytable
                WHERE CONTAINS (TEXT, 'searchterm', 1) > 0
                ORDER BY SCORE(1) DESC;

Note that for this to work efficiently, there must be no other criteria in the WHERE clause other than a single CONTAINS.

7.8.10 Which memory settings affect querying?

Answer: For querying, you want to strive for a large system global area (SGA). You can set these parameters related to SGA in your Oracle Database initialization file. You can also set these parameters dynamically.

The SORT_AREA_SIZE parameter controls the memory available for sorting for ORDER BY queries. You should increase the size of this parameter if you frequently order by structured columns.

See Also:

7.8.11 Does out-of-line LOB storage of wide base table columns improve performance?

Answer: Yes. Typically, a SELECT statement selects more than one column from your base table. Because Oracle Text fetches columns to memory, it is more efficient to store wide base table columns such as LOBs out of line, especially when these columns are rarely updated but frequently selected.

When LOBs are stored out of line, only the LOB locators need to be fetched to memory during querying. Out of line storage reduces the effective size of the base table making it easier for Oracle Text to cache the entire table to memory. This reduces the cost of selecting columns from the base table, and hence speeds up text queries.

In addition, having smaller base tables cached in memory enables more index table data to be cached during querying, which improves performance.

7.8.12 How can I make a CONTAINS query on more than one column faster?

Answer: The fastest type of query is one where there is only a single CONTAINS clause, and no other conditions in the WHERE clause.

Consider the following multiple CONTAINS query:

              SELECT title, isbn FROM booklist
                WHERE CONTAINS (title, 'horse') > 0
                  AND CONTAINS (abstract, 'racing') > 0

We can obtain the same result with section searching and the WITHIN operator as follows:

              SELECT title, isbn FROM booklist
                WHERE CONTAINS (alltext, 
                  'horse WITHIN title AND racing WITHIN abstract')>0

This query completes more quickly. To use a query like this, we must copy all the data into a single text column for indexing, with section tags around each column's data. This can be done with PL/SQL procedures before indexing, or by making use of the USER_DATASTORE datastore during indexing to synthesize structured columns with the text column into one document.

7.8.13 Is it OK to have many expansions in a query?

Answer: Each distinct word used in a query requires at least one row to be fetched from the index table. It is therefore best to keep the number of expansions down as much as possible.

You should not use expansions such as wild cards, thesaurus, stemming and fuzzy matching unless they are necessary to the task. In general, a few expansions (for example, 10 to 20) does not cause difficulty, but avoid having large numbers of explansions (80 or 100) in a query. The query feedback mechanism can be used to determine the number of expansions for any particular query expression.

In addition for wildcard and stem queries, you can remove the cost of term expansion from query time to index time by creating prefix, substring or stem indexes. Query performance increases at the cost of longer indexing time and added disk space.

Prefix and substring indexes can improve wildcard performance. You enable prefix and substring indexing with the BASIC_WORDLIST preference. The following example sets the wordlist preference for prefix and substring indexing. For prefix indexing, it specifies that Oracle Text create token prefixes between 3 and 4 characters long:

begin 
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST'); 
ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');
ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH', '3');
ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', '4');
ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
end

You enable stem indexing with the BASIC_LEXER preference:

begin
ctx_ddl.create_preference('mylex', 'BASIC_LEXER');
ctx_ddl.set_attribute ( 'mylex', 'index_stems', 'ENGLISH');
end;

7.8.14 How can local partition indexes help?

Answer: You can create local partitioned CONTEXT indexes on partitioned tables. This means that on a partitioned table, each partition has its own set of index tables. Effectively, there are multiple indexes, but the results from each are combined as necessary to produce the final result set.

The index is created using the LOCAL keyword:

CREATE INDEX index_name ON table_name (column_name) 
INDEXTYPE IS ctxsys.context
PARAMETERS ('...')
LOCAL

With partitioned tables and local indexes, you can improve performance of the following types of CONTAINS queries:

7.8.15 Should I query in parallel?

Answer: It depends on system load and server capacity. Even though parallel querying is the default behavior for indexes created in parallel, it usually results in degrading overall query throughput on heavily loaded systems.

In general, parallel queries are particularly appropriate for DSS or analytical systems with large data collections, multiple CPUs, and low number of concurrent users.

7.8.16 Should I index themes?

Answer: Indexing theme information with a CONTEXT index takes longer and also increases the size of your index. However, theme indexes enable ABOUT queries to be more precise by using the knowledge base, if available. If your application uses ABOUT queries heavily, it might be worthwhile to create a theme component to the index, despite the extra indexing time and extra storage space required.

7.8.17 When should I use a CTXCAT index?

Answer: CTXCAT indexes work best when text is in small chunks, maybe a few lines maximum, and searches need to restrict or sort the result set according to certain structured criteria, usually numbers or dates.

For example, consider an on-line auction site. Each item for sale has a short description, a current bid price, and dates for the start and end of the auction. A user might want to see all the records with antique cabinet in the description, with a current bid price less than $500. Because he is particularly interested in newly posted items, he wants the results sorted by auction start time.

Such a search is not always efficient with a CONTAINS structured query on a CONTEXT index, where the response time can vary significantly depending on the structured and CONTAINS clauses. This is because the intersection of structured and CONTAINS clauses or the ordering of text query is computed during query time.

By including structured information such as price and date within the CTXCAT index, query response time is always in an optimal range regardless of search criteria. This is because the interaction between text and structured query is pre-computed during indexing. Consequently query response time is optimum.

7.8.18 When is a CTXCAT index NOT suitable?

Answer: There are differences in the time and space needed to create the index. CTXCAT indexes take a bit longer to create and use considerably more disk space than CONTEXT indexes. If you are tight on disk space, you should consider carefully whether CTXCAT indexes are appropriate for you.

With respect to query operators, you can now use the richer CONTEXT grammar in CATSEARCH queries with query templates. The older restriction of a single CATSEARCH query grammar no longer holds.

7.8.19 What optimizer hints are available, and what do they do?

Answer: The optimizer hint INDEX(table column) can be used in the usual way to drive the query with a text or b-tree index.

You can also use the NO_INDEX(table column) hint to disable a specific index.

Additionally, the FIRST_ROWS(n) hint has a special meaning for text queries and should be used when you need the first n hits to a query. Use of the DOMAIN_INDEX_SORT hint in conjunction with ORDER BY SCORE(n) DESC tells the Oracle optimizer to accept a sorted set from the text index, and not to do a further sort.

7.9 Frequently Asked Questions About Indexing Performance

This section answers some of the frequently asked questions about indexing performance.

7.9.1 How long should indexing take?

Answer: Indexing text is a resource-intensive process. The speed of indexing will depend on the power of the hardware involved. Indexing speed depends on CPU and I/O capacity. Given sufficient I/O capacity to read in the original data and write out index entries, then CPU will be the limiting factor.

Tests with Intel x86 (Core 2 architecture, 2.5GHz) CPUs have shown that Oracle Text can index around 100GB of text per CPU core, per day. This would be expected to increase as CPU clock speeds increase and/or CPU architectures become more efficient.

Other factors such as your document format, location of your data, and the calls to user-defined datastores, filters, and lexers can have an impact on your indexing speed.

7.9.2 Which index memory settings should I use?

Answer: You can set your index memory with the system parameters DEFAULT_INDEX_MEMORY and MAX_INDEX_MEMORY. You can also set your index memory at run time with the CREATE INDEX memory parameter in the parameter string.

You should aim to set the DEFAULT_INDEX_MEMORY value as high as possible, without causing paging.

You can also improve Indexing performance by increasing the SORT_AREA_SIZE system parameter.

Oracle recommends that you use a large index memory setting. Large settings, even up to hundreds of megabytes, can improve the speed of indexing, and reduce the fragmentation of the final indexes. However, if you set the index memory setting too high, then memory paging can occur that will reduce indexing speed.

With parallel indexing, each stream requires its own index memory. When dealing with very large tables, you can tune your database system global area (SGA) differently for indexing and retrieval. For querying, you want to get as much information cached in the system global area's (SGA) block buffer cache as possible. So you should allocate a large amount of memory to the block buffer cache. But this will not make any difference to indexing, so you would be better off reducing the size of the SGA to make more room for a large index memory settings during indexing.

You set the size of SGA in your Oracle Database initialization file.

See Also:

7.9.3 How much disk overhead will indexing require?

Answer: The overhead, the amount of space needed for the index tables, varies between about 50% of the original text volume and 200%. Generally, the larger the total amount of text, the smaller the overhead, but many small records will use more overhead than fewer large records. Also, clean data (such as published text) will require less overhead than dirty data such as e-mails or discussion notes, because the dirty data is likely to include many unique words from mis-spellings and abbreviations.

A text-only index is smaller than a combined text and theme index. A prefix and substring index makes the index significantly larger.

7.9.4 How does the format of my data affect indexing?

Answer: You can expect much lower storage overhead for formatted documents such as Microsoft Word files because such documents tend to be very large compared to the actual text held in them. So 1GB of Word documents might only require 50MB of index space, whereas 1GB of plain text might require 500MB, because there is ten times as much plain text in the latter set.

Indexing time is less clear-cut. Although the reduction in the amount of text to be indexed will have an obvious effect, you must balance this out against the cost of filtering the documents with the AUTO_FILTER filter or other user-defined filters.

7.9.5 Can parallel indexing improve performance?

Answer: Parallel indexing can improve index performance when you have a large amount of data, and have multiple CPUs.

You use the PARALLEL keyword when creating the index:

CREATE INDEX index_name ON table_name (column_name) 
INDEXTYPE IS ctxsys.context PARAMETERS ('...') PARALLEL 3;

This will create the index with up to three separate indexing processes depending on your resources.

Parallel indexing can also be used to create local partitioned indexes on partitioned tables. However, indexing performance only improves when you have multiple CPUs.

Note:

Using PARALLEL to create a local partitioned index enables parallel queries. (Creating a non-partitioned index in parallel does not turn on parallel query processing.)

Parallel querying degrades query throughput especially on heavily loaded systems. Because of this, Oracle recommends that you disable parallel querying after parallel indexing. To do so, use ALTER INDEX NOPARALLEL.

7.9.6 How can I improve index performance for creating local partitioned index?

Answer: When you have multiple CPUs, you can improve indexing performance by creating a local index in parallel. There are two ways to index in parallel:

You can create a local partitioned index in parallel in two ways:

  • Use the PARALLEL clause with the LOCAL clause in CREATE INDEX.In this case, the maximum parallel degree is limited to the number of partitions you have.

  • Create an unusable index first, then run the DBMS_PCLXUTIL.BUILD_PART_INDEX utility. This method can result in a higher degree of parallelism, especially if you have more CPUs than partitions.

The following is an example for the second method. In this example, the base table has three partitions. We create a local partitioned unusable index first, the run the DBMS_PCLUTIL.BUILD_PART_INDEX, which builds the 3 partitions in parallel (inter-partition parallelism). Also inside each partition, index creation is done in parallel (intra-partition parallelism) with a parallel degree of 2.

create index tdrbip02bx on tdrbip02b(text) 
indextype is ctxsys.context local (partition tdrbip02bx1, 
                                   partition tdrbip02bx2, 
                                   partition tdrbip02bx3) 
unusable; 

exec dbms_pclxutil.build_part_index(3,2,'TDRBIP02B','TDRBIP02BX',TRUE); 

7.9.7 How can I tell how much indexing has completed?

Answer: You can use the CTX_OUTPUT.START_LOG procedure to log output from the indexing process. Filename will normally be written to $ORACLE_HOME/ctx/log, but you can change the directory using the LOG_DIRECTORY parameter in CTX_ADM.SET_PARAMETER.

See Also:

Oracle Text Reference to learn more about using this procedure

7.10 Frequently Asked Questions About Updating the Index

This section answers some of the frequently asked questions about updating your index and related performance issues.

7.10.1 How often should I index new or updated records?

Answer: The less often you run reindexing with CTX_DDL.SYNC_INDEX, the less fragmented your indexes will be, and the less you will need to optimize them.

However, this means that your data will become progressively more out of date, which may be unacceptable for your users.

Overnight indexing is acceptable for many systems. In this case, data that is less than a day old is not searchable. Other systems use hourly, ten minute, or five minute updates.

See Also:

7.10.2 How can I tell when my indexes are getting fragmented?

Answer: The best way is to time some queries, run index optimization, then time the same queries (restarting the database to clear the SGA each time, of course). If the queries speed up significantly, then optimization was worthwhile. If they don't, you can wait longer next time.

You can also use CTX_REPORT.INDEX_STATS to analyze index fragmentation.

See Also:

7.10.3 Does memory allocation affect index synchronization?

Answer: Yes, the same way as for normal indexing. There are often far fewer records to be indexed during a synchronize operation, so it is not usually necessary to provide hundreds of megabytes of indexing memory.