Oracle® Secure Enterprise Search Administrator's Guide 11g Release 1 (11.1.2.2) Part Number E21605-01 |
|
|
PDF · Mobi · ePub |
Oracle SES contains a lot of features that optimize the search performance. This section contains suggestions on how to improve the response time and throughput performance of Oracle SES. It identifies the most common ways to improve search quality.
See Also:
"Searching on Date Attributes"Suggested links enable you to direct users to a designated Web site for particular query keywords. For example, when users search for "Oracle Secure Enterprise Search documentation" or "Enterprise Search documentation" or "Search documentation", you could suggest http://www.oracle.com/technology
.
Suggested link keywords are rules that determine which suggested links are returned (as suggestions) for a query. A rule can include query terms and logical operators. For example, "secure AND search". With this rule, the corresponding suggested link is returned for the query "secure enterprise search", but it is not returned for the query "secure database".
The rule language used for the indexed queries supports the following operators:
Table 12-1 Suggested Link Keyword Operators
Operator | Example |
---|---|
ABOUT |
about(dogs) |
AND |
dog and cat |
NEAR |
dog ; cat |
OR |
dog or cat |
PHRASE |
dog sled |
STEM |
$dog |
THESAURUS |
SYN(dog) |
Note:
Do not use special characters, such as #, $, =, and &, in keywords.Suggested links appear at the top of the search result list. Oracle SES can display up to two suggested links for each query.
This feature is especially useful for providing links to important Web pages that are not crawled by Oracle Secure Enterprise Search. Add or edit suggested links on the Search - Suggested Links page in the Oracle SES Administration GUI.
Parallel querying significantly improves search performance and facilitates searches of very large data sources. The query architecture is based on Oracle Database partitioning and enhancements in Oracle Text.
To make the best use of this feature, Oracle recommends that you run Oracle SES on a server with a 4-Core CPU, with at least 8GB of RAM and multiple fast disk drives.
Parallel querying is automatically implemented on Oracle SES when the partitioning option is enabled. Partitioning can only be enabled on a newly installed Oracle SES instance.
To enable partitioning:
Log in as eqsys
and execute the following SQL commands:
exec eq_adm.use_instance(1) exec eq_par.enable_partition
Next, configure the partition by setting up the storage areas and partition rules. You can do this using the Administration API. See Oracle Secure Enterprise Search Administration API Guide for more information.
Define the data sources and start the crawl process.
Note:
Once enabled, the partitioning option cannot be disabled. Therefore, by default, parallel querying cannot be disabled either.A storage area in Oracle SES corresponds to a physical disk. To make optimum use of the parallel querying feature, you must create as many storage areas as there are physical disks.
A storage area is a user defined object with the following attributes:
name
description (can be updated)
locations (Oracle SES 11g supports only a single location)
usage (can be SYSTEM, CRAWLER, CACHE FILE, or PARTITION)
For each location, you can provide the following details:
path
preAllocatedSpace (in MB, can be updated)
device (can be updated)
quota (in MB, can be updated)
currentSize (in MB). It also contains the lastRefreshDate
parameter which indicates the time when currentSize
was calculated.
You can create, export, update, and delete storage areas. Use the Administration API to perform these operations and manage storage areas. See Oracle Secure Enterprise Search Administration API Guide for more information.
Note the following about the various operations:
Allow users to create and delete only those storage areas that have the usage
type set to PARTITION
.
Only the following fields can be updated: description
, preAllocatedSpace
, device
, and quota
.
The storage area schema is as defined:
<xsd:element name = "storageAreas" minOccurs = "0" maxOccurs = "1"> <xsd:complexType> <xsd:sequence> <xsd:element name = "storageArea" minOccurs = "0" maxOccurs = "unbounded"> <xsd:complexType> <xsd:all> <xsd:element name = "name" type = "xsd:string" minOccurs = "1" maxOccurs = "1" /> <xsd:element name = "description" type = "xsd:string" minOccurs = "1" maxOccurs = "1" /> <xsd:element name = "usage" type = "xsd:string" minOccurs = "1" maxOccurs = "1" /> <xsd:element name = "locations" minOccurs = "1" maxOccurs = "1"> <xsd:complexType> <xsd:sequence> <xsd:element name = "location" minOccurs = "1" maxOccurs = "1"> <xsd:complexType> <xsd:all> <xsd:element name = "path" type = "xsd:string" minOccurs = "1" maxOccurs = "1"/> <xsd:element name = "device" type = "xsd:string" minOccurs = "0" maxOccurs = "1"/> <xsd:element name = "preAllocatedSpace" type = "xsd:int" minOccurs = "0" maxOccurs = "1"/> <xsd:element name = "quota" type = "xsd:int" minOccurs = "0" maxOccurs = "1"/> <xsd:element name = "currentSize" minOccurs = "0" maxOccurs = "1"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base = "xsd:string"> <xsd:attribute name = "lastRefreshDate" type = "xsd:string" /> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element> </xsd:all> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:all> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element>
For example,
<search:storageArea> <search:name>Cache directory</search:name> <search:description>The path where SES store cache files</search:description> <search:usage>SYSTEM</search:usage> <search:locations> <search:location> <search:path>/oracle/work/regress/</search:path> <search:device>default</search:device> </search:location> </search:locations> </search:storageArea>
Configuring a partition includes updating partition attributes, updating partition rules, and exporting configurations. A partition can typically include multiple storage areas. For example, the following command configures a hash partition over six storage areas.
$ORACLE_HOME/bin/searchadmin -u eqsys -p eqsys_password update partitionConfig -i /scratch/configHashPartition.xml
where configHashPartition.xml
is:
<search:config productVersion="11.1.2.2.0" xmlns:search="http://xmlns.oracle.com/search"> <search:partitionConfig> <search:partitionRules> <search:partitionRule> <search:partitionValue>EQ_DEFAULT</search:partitionValue> <search:valueType>META</search:valueType> <search:ruleType>HASH</search:ruleType> <search:ruleSetting/> <search:storageArea>SA1, SA2, SA3, SA4, SA5, SA6</search:storageArea> </search:partitionRule> </search:partitionRules> </search:partitionConfig> </search:config>
With this partition configuration, all documents are hash partitioned and evenly distributed across storage areas SA1 though SA6.
The partition configuration schema is as defined:
<!-- Partition Configuration --> <xsd:element name = "partitionConfig" minOccurs = "0" maxOccurs = "1"> <xsd:complexType> <xsd:sequence> <xsd:element name = "partitionRules" minOccurs = "0" maxOccurs = "1"> <xsd:complexType> <xsd:sequence> <xsd:element name = "partitionRule" minOccurs = "0" maxOccurs = "unbounded"> <xsd:complexType> <xsd:all> <xsd:element name = "partitionValue" type = "xsd:string" minOccurs = "1" maxOccurs = "1"/> <xsd:element name = "valueType" type = "xsd:string" minOccurs = "1" maxOccurs = "1"/> <xsd:element name = "ruleType" type = "xsd:string" minOccurs = "1" maxOccurs = "1"/> <xsd:element name = "ruleSetting" type = "xsd:string" minOccurs = "0" maxOccurs = "1"/> <xsd:element name = "storageArea" type = "xsd:string" minOccurs = "0" maxOccurs = "1"/> </xsd:all> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element>
The different elements are:
search:partitionConfig: Contains partition configuration rules.
search:partitionRules: Contains one or more partition rules.
search:partitionRule: Describes a partition rule. It consists of the following elements:
search:partitionValue: Specify the system defined special value EQ_DEFAULT
.
search:valueType: Type of partition value. Enter META
in this field.
search:ruleType: Type of partition rule. Enter HASH
in this field.
search:ruleSetting: Do not specify any value.
search:storageArea: A comma-delimited list of storage areas included in the partition.
The ideal goal for any search engine is to auto-manage index fragmentation. With semi-automatic index fragmentation management, Oracle SES comes close to achieving this goal. Some garbage collection is still required on an infrequent basis, maybe once a month. For this reason, the Oracle SES administrator can still schedule index optimizations to run during non-peak hours.
The new index fragmentation management feature is implemented on top of an enhancement in Oracle Text, which allows the search engine index to be updated while Oracle SES is executing searches. This is achieved by temporarily saving index changes to an in-memory index and periodically merging them with the larger disk-based search engine index. This reduces fragmentation, and leads to faster response times.
The new index fragmentation management is implemented automatically on Oracle SES, but it can be tuned by configuring Oracle Text, where you can turn index fragmentation management on and off, and specify the frequency of index merges.
This involves tuning the Oracle Database to obtain optimum benefits of the indexing option in Oracle Text.
By default, when you install Oracle SES, the indexing option, Staging Text Index
, is enabled. This automatically sets up the KEEP pool of the database because the DR$EQ$DOC_PATH_IDX$G
table that temporarily stages the index is stored in the KEEP pool.
By default, Oracle Database allocates 10% of the default buffer pool size to the KEEP pool. The DR$EQ$DOC_PATH_IDX$G
table expands and shrinks on a real time basis depending on the volume of the indexing activity. Thus, if there is a high volume of indexing activity, then it is likely that the average size of the DR$EQ$DOC_PATH_IDX$G
table is greater than the size of the KEEP pool. This can result in slower query response time. To prevent this, you can allocate more space to the KEEP pool.
Note:
Do not attempt to modify the KEEP pool size if you are not familiar with database tuning operations. Ideally, only the database administrator must be allowed to modify the KEEP pool size.If the KEEP pool size is not sufficient, then you are likely to see high physical read from DR$EQ$DOC_PATH_IDX$G
table and/or DR$EQ$DOC_PATH_IDX$H
segments in AWR (automatic workload repository) report or V$SEGSTAT
view. If you observe high physical read from DR$EQ$DOC_PATH_IDX$G and DR$EQ$DOC_PATH_IDX$H tables, then consider increasing the KEEP pool size.
Use SQL*Plus to modify the size of the KEEP pool. For example, to allocate 400 MB to the pool, execute the following:
SQL> alter system set DB_KEEP_CACHE_SIZE=400M scope=both;
To know the current KEEP pool size, you must access the view V$SGA_DYNAMIC_COMPONENTS
. Use the following command:
SQL> select current_size from v$sga_dynamic_components where component = 'KEEP buffer cache';
The output is similar to the following:
CURRENT_SIZE ------------ 419430400
See Also:
Oracle Database Performance Tuning Guide for more information about the KEEP pool buffer.Optimizing the index reduces fragmentation, and it can significantly increase the speed of searches. Schedule index optimization on a regular basis. Also, optimize the index after the crawler has made substantial updates or if fragmentation is more than 50%. Verify that index optimization is scheduled during off-peak hours. Optimization of a very large index could take several hours.
You can see the fragmentation level and run index optimization on the Global Settings - Index Optimization page in the Oracle SES Administration GUI. You can specify a maximum number of hours for the optimization to run, but for best performance, run the optimization until completion. Oracle SES uses a faster optimization method and creates a more compact copy of the index when no time limit is set.
To improve indexing performance, adjust the following parameters on the Global Settings - Set Indexing Parameters page of the Oracle SES Administration GUI:
When the crawled data in the cache directory reaches Indexing Batch Size, Oracle SES starts indexing. The bigger the batch size, the longer it takes to start indexing each batch. Only indexed data can be searched: Data in the cache cannot be searched. The default size is 250M.
Document fetching and indexing run concurrently. While indexing is running, the Oracle SES crawler continues to fetch documents and store them in the cache directory.
This is the upper limit of memory used for indexing before flushing the index to disk.
A large amount of memory improves indexing performance because it reduces I/O. It also improves query performance because the created index is less fragmented from the beginning, while a fragmented index can be optimized later. Set this parameter as high as possible without causing memory paging.
A smaller amount of memory might be useful when indexing progress should be tracked or when run-time memory is scarce. The default size is 275M. In general, increasing the Indexing Memory Size parameter can reduce fragmentation.
See the Home - Statistics page in the Oracle SES Administration GUI for lists of the most popular queries, failed queries, and ineffective queries. This information can lead to the following actions:
Refer users to a particular Web site for failed queries on the Search - Suggested Links page.
Fix common errors that users make in searching on the Search - Alternate Words page.
Make important documents easier to find on the Search - Relevancy Boosting page.
Every hour, SES automatically summarizes logged queries. The summarizing task might utilize the server resource if there are a large number of logged queries, and this might impact the query performance. This issue is visible for stress tests where several queries are executed every second. The ideal solution in such instances is to disable the query statistics option.
To disable the query statistics option:
From the Home page, click Global Settings, then Query Configuration.
Under Query Statistics, select No for the Enable Query Statistics option.
Relevancy boosting lets administrators influence the order of documents in the result list for a particular search. You might want to override the default results for the following reasons:
For a highly popular search, direct users to the best results
For a search that returns no results, direct users to some results
For a search that has no click-throughs, direct users to better results
In a search, each result is assigned a score that indicates how relevant the result is to the search; that is, how good a result it is. Sometimes you know the documents that are highly relevant to some search. For example, your company Web site could have a home page for XML (http://example.com/XML-is-great.htm
), which you want to appear high in the results of any search for XML
. You would boost the score of the XML home page to 100 for an XML
search.
The document also has a score computed for searches that are not among the boosted queries.
Two methods can help you locate URLs for relevancy boosting: locate by search and manual URL entry.
Relevancy boosting, like end user searching, is case-insensitve. For example, a document with a boosted score for Oracle
is boosted for oracle
.
Note:
Relevancy boosting is disabled for searches that use theotext
syntax.If you expect heavy loads on the Oracle SES server, then configure the Java Virtual Machine (JVM) heap size for better performance.
The heap size is defined in the ORACLE_HOME
/search/config/searchctl.conf
file. By default, the following values are given:
COMMON_MEM_ARGS = -Xmx2048m -Xms512m
Increase the value of these parameters appropriately for your system configuration. The -Xmx
value should not exceed the physical memory size.
Then restart the middle tier:
searchctl restart
Heavy query load should not coincide with heavy crawl activity, especially when there are large-scale changes on the target site. If it does, such as when a crawl is scheduled around the clock, then increase the size of the Oracle undo tablespace with the UNDO_RETENTION
parameter.
See Also:
Oracle Database SQL Language Reference and Oracle Database Administrator's Guide on Oracle Technology Network for more information about increasing the Oracle undo spaceIf you plan to use the Oracle SES default query user interface and have an Oracle Application Server Web Cache installation, then you can use its compression utility to compress the content Oracle SES sends over the network. For example, the utility can compress results.jsp
from 980 to 72K. Compression provides the greatest benefit to users connecting over the Internet.
Use these Web cache compression rules:
/search/search?(.*) /search/results.jsp?(.*)
OracleAS Web Cache does not benefit custom querying applications.