Parallel Query Indexing

To scale up the indexed data size while maintaining satisfactory query response time, the indexed data can be stored in independent disks to perform disk I/O operations in parallel. The major features of this architecture are:

  • Oracle SES index is partitioned, so that the sub-queries are executed in parallel.

  • Disks perform I/O operations independent of one another. As a result, the I/O bus contention does not create a significant bottleneck on the collective I/O throughput.

  • Partition rules are used to control the document distribution among the partitions.

Figure 4-1 End User Query Partitioning

Description of Figure 4-1 follows
Description of "Figure 4-1 End User Query Partitioning"

Document Partition Model and Storage Areas

Storage areas are used to store the partitions when the partitioning option is enabled. See "Storage Areas" for more information.

There are two kinds of partition mechanisms for improving query performance, attribute-based partitioning and hash-based partitioning. Currently, Oracle SES supports only hash-based partitioning.

Hash-based partitioning uses a hash function to distribute a large set of documents into multiple partitions. A partition engine controls the partition logic at both crawl time and query time. When a large data set must be searched without pruning the conditions, the end user request is broken into multiple parallel sub-queries so that the I/O and CPU resources can be utilized in parallel. After the result sets of the sub-queries are returned by the independent query processors, a merged result set is returned to the end user.

Figure 4-2 shows how the mechanism works during crawl time. The documents are partitioned and stored in different storage areas. Note that the storage areas are created on separate physical disks, so that I/O operations can be performed in parallel to improve the search turn around time.

Figure 4-2 Document Partitioning at Crawl Time

Description of Figure 4-2 follows
Description of "Figure 4-2 Document Partitioning at Crawl Time"

At query time, the query partition engine generates sub-queries and submits them to the storage areas, as shown in Figure 4-3.

Figure 4-3 Generation of Sub Queries at Query Time

Description of Figure 4-3 follows
Description of "Figure 4-3 Generation of Sub Queries at Query Time"

See "Parallel Querying and Index Partitioning" for more information.

Note:

In previous releases, the base path of Oracle SES was referred to as ORACLE_HOME. In Oracle SES release 11g, the base path is referred to as ORACLE_BASE. This represents the Software Location that you specify at the time of installing Oracle SES.

ORACLE_HOME now refers to the path ORACLE_BASE/seshome.

For more information about ORACLE_BASE, see "Conventions".