Tuning Search Performance

Oracle SES contains a lot of features that optimize the search performance. This section contains suggestions on how to improve the response time and throughput performance of Oracle SES. It identifies the most common ways to improve search quality.

Adding Suggested Links

Suggested links enable you to direct users to a designated Web site for particular query keywords. For example, when users search for "Oracle Secure Enterprise Search documentation" or "Enterprise Search documentation" or "Search documentation", you could suggest http://www.oracle.com/technology.

Suggested link keywords are rules that determine which suggested links are returned (as suggestions) for a query. A rule can include query terms and logical operators. For example, "secure AND search". With this rule, the corresponding suggested link is returned for the query "secure enterprise search", but it is not returned for the query "secure database".

The rule language used for the indexed queries supports the following operators:

Table 12-1 Suggested Link Keyword Operators

Operator Example

ABOUT

about(dogs)

AND

dog and cat

NEAR

dog ; cat

OR

dog or cat

PHRASE

dog sled

STEM

$dog

THESAURUS

SYN(dog)


Note:

Do not use special characters, such as #, $, =, and &, in keywords.

Suggested links appear at the top of the search result list. Oracle SES can display up to two suggested links for each query.

This feature is especially useful for providing links to important Web pages that are not crawled by Oracle Secure Enterprise Search. Add or edit suggested links on the Search - Suggested Links page in the Oracle SES Administration GUI.

Parallel Querying and Index Partitioning

Parallel querying significantly improves search performance and facilitates searches of very large data sources. The query architecture is based on Oracle Database partitioning and enhancements in Oracle Text.

To make the best use of this feature, Oracle recommends that you run Oracle SES on a server with a 4-Core CPU, with at least 8GB of RAM and multiple fast disk drives.

Parallel querying is automatically implemented on Oracle SES when the partitioning option is enabled. Partitioning can only be enabled on a newly installed Oracle SES instance.

To enable partitioning: 

  1. Log in as eqsys and execute the following SQL commands:

    exec eq_adm.use_instance(1)
    exec eq_par.enable_partition
    
  2. Next, configure the partition by setting up the storage areas and partition rules. You can do this using the admin API. See Oracle Secure Enterprise Search Administration API Guide for more information.

  3. Define the data sources and start the crawl process.

Note:

Once enabled, the partitioning option cannot be disabled. Therefore, by default, parallel querying cannot be disabled either.

Storage Areas

A storage area in Oracle SES corresponds to a physical disk. To make optimum use of the parallel querying feature, you must create as many storage areas as there are physical disks.

A storage area is a user defined object with the following attributes:

  • name

  • description (can be updated)

  • locations (Oracle SES 11g supports only a single location)

  • usage (can be SYSTEM, CRAWLER, CACHE FILE, or PARTITION)

For each location, you can provide the following details:

  • path

  • preAllocatedSpace (in MB, can be updated)

  • device (can be updated)

  • quota (in MB, can be updated)

  • currentSize (in MB). It also contains the lastRefreshDate parameter which indicates the time when currentSize was calculated.

You can create, export, update, and delete storage areas. Use the admin API to perform these operations and manage storage areas. See Oracle Secure Enterprise Search Administration API Guide for more information.

Note the following about the various operations:

  • Allow users to create and delete only those storage areas that have the usage type set to PARTITION.

  • Only the following fields can be updated: description, preAllocatedSpace, device, and quota.

Storage Area Schema

The storage area schema is as defined:

<xsd:element name = "storageAreas" minOccurs = "0" maxOccurs = "1">
    <xsd:complexType>
     <xsd:sequence>
      <xsd:element name = "storageArea" minOccurs = "0" maxOccurs = "unbounded">
    <xsd:complexType>
     <xsd:all>
      <xsd:element name = "name" type = "xsd:string" minOccurs = "1" maxOccurs = "1" />
      <xsd:element name = "description" type = "xsd:string" minOccurs = "1" maxOccurs = "1" />
      <xsd:element name = "usage" type = "xsd:string" minOccurs = "1" maxOccurs = "1" />
      <xsd:element name = "locations" minOccurs = "1" maxOccurs = "1">
    <xsd:complexType>
     <xsd:sequence>
      <xsd:element name = "location" minOccurs = "1" maxOccurs = "1">
       <xsd:complexType>
        <xsd:all>
         <xsd:element name = "path" type = "xsd:string" minOccurs = "1" maxOccurs = "1"/>
         <xsd:element name = "device" type = "xsd:string" minOccurs = "0" maxOccurs = "1"/>
         <xsd:element name = "preAllocatedSpace" type = "xsd:int" minOccurs = "0" maxOccurs = "1"/>
         <xsd:element name = "quota" type = "xsd:int" minOccurs = "0" maxOccurs = "1"/>
         <xsd:element name = "currentSize" minOccurs = "0" maxOccurs = "1">
       <xsd:complexType>
        <xsd:simpleContent>
         <xsd:extension base = "xsd:string">
          <xsd:attribute name = "lastRefreshDate" type = "xsd:string" />
         </xsd:extension>
        </xsd:simpleContent>
       </xsd:complexType>
      </xsd:element>
     </xsd:all>
    </xsd:complexType>
   </xsd:element>
  </xsd:sequence>
 </xsd:complexType>
</xsd:element>
</xsd:all>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>

For example,

<search:storageArea>
 <search:name>Cache directory</search:name>
 <search:description>The path where SES store cache files</search:description>
 <search:usage>SYSTEM</search:usage>
 <search:locations>
  <search:location>
   <search:path>/oracle/work/regress/</search:path>
  <search:device>default</search:device>
  </search:location>
 </search:locations>
</search:storageArea>

Configuring a Partition

Configuring a partition includes updating partition attributes, updating partition rules, as well as exporting configurations. A partition can typically include multiple storage areas. For example, the following command configures a hash partition over six storage areas.

$ORACLE_HOME/bin/searchadmin -u eqsys -p eqsys_password update partitionConfig -i /scratch/configHashPartition.xml

where configHashPartition.xml is:

<search:config productVersion="11.1.1.0.0">
<search:partitionConfig>
<search:partitionRules>
<search:partitionRule>
<search:partitionValue>EQ_DEFAULT</search:partitionValue>
<search:valueType>META</search:valueType>
<search:ruleType>HASH</search:ruleType>
<search:ruleSetting/>
<search:storageArea>SA1, SA2, SA3, SA4, SA5, SA6</search:storageArea>
</search:partitionRule>
</search:partitionRules>
</search:partitionConfig>
</search:config>

With this partition configuration, all documents are hash partitioned and evenly distributed across storage areas SA1 though SA6.

partitionConfig Schema

The partition configuration schema is as defined:

<!-- Partition Configuration -->
<xsd:element name = "partitionConfig" minOccurs = "0" maxOccurs = "1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "partitionRules" minOccurs = "0" maxOccurs = "1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "partitionRule" minOccurs = "0" maxOccurs = "unbounded">
<xsd:complexType>
<xsd:all>
<xsd:element name = "partitionValue" type = "xsd:string" minOccurs = "1" maxOccurs = "1"/>
<xsd:element name = "valueType" type = "xsd:string" minOccurs = "1" maxOccurs = "1"/>
<xsd:element name = "ruleType" type = "xsd:string" minOccurs = "1" maxOccurs = "1"/>
<xsd:element name = "ruleSetting" type = "xsd:string" minOccurs = "0" maxOccurs = "1"/>
<xsd:element name = "storageArea" type = "xsd:string" minOccurs = "0" maxOccurs = "1"/>
</xsd:all>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>

The different elements are:

  • search:partitionConfig: Contains partition configuration rules.

  • search:partitionRules: Contains one or more partition rules.

  • search:partitionRule: Describes a partition rule. It consists of the following elements:

    • search:partitionValue: Specify the system defined special value EQ_DEFAULT.

    • search:valueType: Type of partition value. Enter META in this field.

    • search:ruleType: Type of partition rule. Enter HASH in this field.

    • search:ruleSetting: Do not specify any value.

    • search:storageArea: A comma-separated list of storage areas included in the partition.

Managing Index Fragmentation

The ideal goal for any search engine is to auto-manage index fragmentation. With semi-automatic index fragmentation management, Oracle SES comes close to achieving this goal. Some garbage collection is still required on an infrequent basis, maybe once a month. For this reason, the Oracle SES administrator still has the ability to schedule index optimizations to run during non-peak hours.

The new index fragmentation management feature is implemented on top of an enhancement in Oracle Text, which allows the search engine index to be updated while Oracle SES is executing searches. This is achieved by temporarily saving index changes to an in-memory index and periodically merging them with the larger disk-based search engine index. This reduces fragmentation, and leads to faster response times.

The new index fragmentation management is implemented automatically on Oracle SES, but it can be tuned by configuring Oracle Text, where you can turn index fragmentation management on and off, and specify the frequency of index merges.

Modifying the KEEP Pool Size

This involves tuning the Oracle Database to obtain optimum benefits of the indexing option in Oracle Text.

By default, when you install Oracle SES, the indexing option, Staging Text Index, is enabled. This automatically sets up the KEEP pool of the database because the DR$EQ$DOC_PATH_IDX$G table that temporarily stages the index is stored in the KEEP pool.

By default, Oracle Database allocates 10% of the default buffer pool size to the KEEP pool. The DR$EQ$DOC_PATH_IDX$G table expands and shrinks on a real time basis depending on the volume of the indexing activity. Thus, if there is a high volume of indexing activity, then it is likely that the average size of the DR$EQ$DOC_PATH_IDX$G table is greater than the size of the KEEP pool. This can result in slower query response time. To prevent this, you can allocate more space to the KEEP pool.

Note:

Do not attempt to modify the KEEP pool size if you are not familiar with database tuning operations. Ideally, only the database administrator must be allowed to modify the KEEP pool size.

Determining if the KEEP Pool Size is Sufficient

If the KEEP pool size is not sufficient, then you are likely to see high physical read from DR$EQ$DOC_PATH_IDX$G table and/or DR$EQ$DOC_PATH_IDX$H segments in AWR (automatic workload repository) report or V$SEGSTAT view. If you observe high physical read from DR$EQ$DOC_PATH_IDX$G and DR$EQ$DOC_PATH_IDX$H tables, then consider increasing the KEEP pool size.

Increasing the KEEP Pool Buffer Size

Use SQL*Plus to modify the size of the KEEP pool. For example, to allocate 400 MB to the pool, execute the following:

SQL> alter system set DB_KEEP_CACHE_SIZE=400M scope=both;

To know the current KEEP pool size, you must access the view V$SGA_DYNAMIC_COMPONENTS. Use the following command:

SQL> select current_size  from v$sga_dynamic_components where component = 'KEEP buffer cache';

The output is similar to the following:

CURRENT_SIZE
------------
419430400

See Also:

Oracle Database Performance Tuning Guide for more information about the KEEP pool buffer.

Optimizing the Index

Optimizing the index reduces fragmentation, and it can significantly increase the speed of searches. Schedule index optimization on a regular basis. Also, optimize the index after the crawler has made substantial updates or if fragmentation is more than 50%. Verify that index optimization is scheduled during off-peak hours. Optimization of a very large index could take several hours.

You can see the fragmentation level and run index optimization on the Global Settings - Index Optimization page in the Oracle SES Administration GUI. You can specify a maximum number of hours for the optimization to run, but for best performance, run the optimization until completion. Oracle SES uses a faster optimization method and creates a more compact copy of the index when no time limit is set.

Adjusting the Indexing Parameters

To improve indexing performance, adjust the following parameters on the Global Settings - Set Indexing Parameters page of the Oracle SES Administration GUI:

Indexing Batch Size

When the crawled data in the cache directory reaches Indexing Batch Size, Oracle SES starts indexing. The bigger the batch size, the longer it takes to start indexing each batch. Only indexed data can be searched: Data in the cache cannot be searched. The default size is 250M.

Document fetching and indexing run concurrently. While indexing is running, the Oracle SES crawler continues to fetch documents and store them in the cache directory.

Indexing Memory Size

This is the upper limit of memory used for indexing before flushing the index to disk.

A large amount of memory improves indexing performance because it reduces I/O. It also improves query performance because the created index is less fragmented from the beginning, while a fragmented index can be optimized later. Set this parameter as high as possible without causing memory paging.

A smaller amount of memory might be useful when indexing progress should be tracked or when run-time memory is scarce. The default size is 275M. In general, increasing the Indexing Memory Size parameter can reduce fragmentation.

Parallel Indexing Degree

The number of concurrent threads used for indexing. This parameter is disabled in the current version of Oracle SES; it is always set to 1.

Checking the Search Statistics

See the Home - Statistics page in the Oracle SES Administration GUI for lists of the most popular queries, failed queries, and ineffective queries. This information can lead to the following actions:

  • Refer users to a particular Web site for failed queries on the Search - Suggested Links page.

  • Fix common errors that users make in searching on the Search - Alternate Words page.

  • Make important documents easier to find on the Search - Relevancy Boosting page.

Note that every hour, SES automatically summarizes logged queries. The summarizing task might utilize the server resource if there are a large number of logged queries, and this might impact the query performance. This issue is visible for stress tests where several queries are executed every second. The ideal solution in such instances is to disable the query statistics option.

To do this, from the Home page, click Global Settings, Query Configuration. Under Query Statistics, select No for the Enable Query Statistics option.

Relevancy Boosting

Relevancy boosting lets administrators influence the order of documents in the result list for a particular search. You might want to override the default results for the following reasons:

  • For a highly popular search, direct users to the best results

  • For a search that returns no results, direct users to some results

  • For a search that has no click-throughs, direct users to better results

In a search, each result is assigned a score that indicates how relevant the result is to the search; that is, how good a result it is. Sometimes you know the documents that are highly relevant to some search. For example, your company Web site could have a home page for XML (http://example.com/XML-is-great.htm), which you want to appear high in the results of any search for XML. You would boost the score of the XML home page to 100 for an XML search.

The document also has a score computed for searches that are not among the boosted queries.

Two methods can help you locate URLs for relevancy boosting: locate by search and manual URL entry.

Relevancy boosting, like end user searching, is case-insensitve. For example, a document with a boosted score for Oracle is boosted for oracle.

Increasing the JVM Heap Size

If you expect heavy loads on the Oracle SES server, then configure the Java Virtual Machine (JVM) heap size for better performance.

The heap size is defined in the ORACLE_HOME/search/config/searchctl.conf file. By default, the following values are given:

COMMON_MEM_ARGS = -Xmx2048m -Xms512m

Increase the value of these parameters appropriately for your system configuration. The -Xmx value should not exceed the physical memory size.

Then restart the middle tier:

searchctl restart

Increasing the Oracle Undo Space

Heavy query load should not coincide with heavy crawl activity, especially when there are large-scale changes on the target site. If it does, such as when a crawl is scheduled around the clock, then increase the size of the Oracle undo tablespace with the UNDO_RETENTION parameter.

See Also:

Oracle Database SQL Language Reference and Oracle Database Administrator's Guide on Oracle Technology Network for more information about increasing the Oracle undo space

Optimizing Query Application Performance

If you plan to use the Oracle SES default query user interface and have an Oracle Application Server Web Cache installation, then you can use its compression utility to compress the content Oracle SES sends over the network. For example, the utility can compress results.jsp from 980 to 72K. Compression provides the greatest benefit to users connecting over the Internet.

Use these Web cache compression rules:

/search/search?(.*)
/search/results.jsp?(.*)

OracleAS Web Cache does not benefit custom querying applications.