3.4 Configuring the Search Index

This section covers the following topics:

"Variances in Indexing Tools and Methods"
"Working with the Search Index"
"Text File Full-Text Indexing"
"Managing Zone Text Fields"
"Indexing with Databases"
"Searching Content Using the Oracle Query Optimizer Component"

3.4.1 Variances in Indexing Tools and Methods

Content Server interfaces with a variety of indexing tools such as commercial search engines and databases. The indexing tool to use is chosen before installation based on the purpose and environment in which the content server performs.

Each indexing tool provides full-text indexing and metadata-only indexing. Full-text indexing means that every word in a file is indexed, not only its metadata. Full-text indexing takes longer than metadata indexing; however, it can return a more comprehensive result set. Metadata-only indexing means that every word in the stored content information is indexed. Metadata-only indexing is faster than full-text indexing. By default Content Server is configured to use metadata-only indexing.

3.4.2 Working with the Search Index

This section covers these topics:

"About the Search Index"

"Updating the Search Index"
"Rebuilding the Collection"
"Configuring the Update or Rebuild"
"Disabling Full-Text Indexing"

3.4.2.1 About the Search Index

The Repository Manager utility provides an Indexer tab which administrators can use to perform actions on the search index.

To access the Repository Manager, in the Administration tray click Admin Applets, then click Repository Manager. You can also access the Repository Manager as a standalone application. See "Running Administration Applications in Standalone Mode" for details.

The Indexer tab on the Repository Manager screen enables administrators (not subadministrators) to perform these actions:

Update the Search Index: Incrementally updates the index database. This is usually not necessary because the index is automatically updated approximately every five minutes by the server.
Rebuild the Collection: The search index is entirely rebuilt, and the old index collection is replaced with a new index collection.
Suspend an Update or a Rebuild: Stops the update or rebuild temporarily. You can restart the process by clicking the appropriate Start button.
Cancel Update Search: Index update process terminates, and only files processed to that point are accessible to the search engine.
Cancel Rebuild Collection: Index rebuild process terminates, and the previous index database continues to be used by the search engine.

For more information about managing the repository, see Oracle Fusion Middleware Application Administrator's Guide for Content Server.

3.4.2.2 Updating the Search Index

On the Repository Manager screen, click the Indexer tab.
Click Start in the Automatic Update Cycle area.

3.4.2.3 Rebuilding the Collection

On the Repository Manager screen, click the Indexer tab.
Click Start in the Collection Rebuild Cycle area.

Note:

OracleTextSearch provides a Fast Rebuild function, which you can use through the Repository Manager Indexer function if your site uses the OracleTextSearch feature. For details, see "Fast Rebuild".

3.4.2.4 Configuring the Update or Rebuild

To set the parameters for a search index update or collection rebuild:

On the Repository Manager screen, click the Indexer tab.
Click Configure in either the Automatic Update Cycle portion of the screen or the Collection Rebuild Cycle portion.

Either the Automatic Update Cycle Screen or the Collection Rebuild Cycle Screen is displayed.
Specify the number of content items (files) per indexer batch. This is the maximum number of files that the search index will process simultaneously.
Specify the content items (files) per checkpoint. This is the number of files that will go through all relevant indexing states at a time. You can have multiple batches of files indexed per checkpoint.
Specify the indexer debug level. This is the amount of information pertaining to each file to display in the server window.
Click OK.

3.4.2.5 Disabling Full-Text Indexing

You might want to disable full-text indexing if, for example, you want to conserve file space or if you do not require full-text searching for specific content types. Even if you disable full-text indexing, metadata is still indexed.

To disable full-text indexing on specific files:

Define a format in the Configuration Manager screen named application/noindex.
Enable the Allow Override Format on Check In setting. See "Configuring General Options".
When a user checks in a file that they do not want to be indexed, they should select the application/noindex format. This applies to standard files, batch loads, and archived revisions.

3.4.3 Text File Full-Text Indexing

If you have configured the content server to use DATABASE.FULLTEXT or OracleTextSearch as your indexing engine, Content Server uses the Outside In Content Access module to export content to a text file upon check-in. The text file is then passed to the full-text indexer for full-text indexing.

Note:

When the Outside In Content Access module converts a PostScript file, the conversion process produces text that contains extra characters. Unfortunately, this creates a file that is full-text indexed but cannot be full-text searched.

If you use DATABASE.FULLTEXT, a full-text search can be problematic on large documents. By default, the maximum document size that is indexed is 10MB. This can be changed by setting the MaxIndexableFileSize configuration variable in the Content Server repository. The default is MaxIndexableFileSize=10485760. If larger documents require full-text indexing, the value of MaxIndexableFileSize should be increased.

3.4.4 Managing Zone Text Fields

The functionality described in this section is only available if you have installed and enabled the Database Search Contains Operator feature.

Note:

This feature is not required in OracleTextSearch.

This section covers these topics:

"About Zone Text Fields"

"Enabling and Disabling Zone Text Fields"
"Changing the MinTextFullFieldLength Variable"
"Disabling Database Search Contains Operator"

3.4.4.1 About Zone Text Fields

The Database Search Contains Operator feature enables you to use the Contains search operator to search text fields when performing Database and Database Full Text searches on SQL Server and Oracle. You must first enable the text fields that can be queried using the Contains search operator. These text fields are called zone text fields.

When a text field is added as a zone text field, the text within the field is parsed and a full-text index for the field is created in the database. The database performs all the work of creating the index, and the index is dropped from the database if the text field is disabled as a zone text field. Therefore, there is no need to rebuild the collection after enabling or disabling text fields as a zone text fields.

Important:

Changing a text field to a zone text field can be a very time-consuming operation. The amount of time it takes to parse the text and create the full-text index depends on the number of content items in the content server and the amount of text stored in the text field. However, after the text field has been indexed, you should not experience significant performance issues when updating and adding content items.

When a text field has been enabled as a zone text field, the Contains search operator is available for the text field on the Advanced Search page. It is represented as the Has Word option in the list next to the text field.

Figure 3-1 Has Word option

Description of "Figure 3-1 Has Word option"

3.4.4.2 Enabling and Disabling Zone Text Fields

To enable or disable zone text fields, complete the following steps:

Log in to Content Server as an administrator.
Select Zone Fields Configuration from the Administration menu or the Admin Applets page. The Zone Fields Configuration Page is displayed.
Select the search engine from the list.
To enable text fields as zone text fields, complete the following steps:
1. Select the text fields in the Text Fields list. You can use the [Ctrl] and [Shift] keys on your keyboard to select multiple fields.

By default, text fields with a field length of 20 characters or less are not included in the Text Fields list. You can change this setting by modifying the MinFullTextFieldLength configuration variable. For details, see "Changing the MinTextFullFieldLength Variable".

Click the left arrow button to move the text fields to the Zone Text Fields list.

Click Update.

Important:

Changing a text field to a zone text field can be a very time-consuming operation. The amount of time it takes to parse the text and create the full-text index depends on the number of content items in the content server and the amount of text stored in the text field. However, when the text field has been indexed, you should not experience significant performance issues when updating and adding content items.

To disable zone text fields, complete the following steps:
1. Select the zone text fields in the Zone Text Fields list. You can use the [Ctrl] and [Shift] keys on your keyboard to select multiple fields.
2. Click the right arrow button to move the text fields to the Text Fields list.
3. Click Update.
When enabling and disabling zone text fields, consider the following:
- If you start making changes to the lists and you then want to revert to the last saved lists, click Reset.
- Custom text fields (the Comments field and any customer created text fields) are shared between the Database and Database search engines, and therefore changing the status of these text fields for one search engine also applies the changes to the other search engine.
- Standard text fields (Author, Content ID, Content Type, Title, and so on) can be enabled or disabled independently for each search engine.
- The database performs all the work of creating the indexes, and the index are dropped from the database if the text fields are disabled as zone text fields. Therefore, there is no need to rebuild the collection after enabling or disabling text fields as a zone text fields.
- You must disable a zone text field before the field can be deleted from the content server using Configuration Manager. If you delete an enabled zone text field using Configuration Manager and then click Update Database Design, you will receive an error.
  
  Disabling the zone text field drops the index for the field from the database, allowing the field to be deleted from the database. As an alternative to disabling the zone text field, you could log in to the database and issue a command to drop the index for the field, and then delete the field.
- You might want to disable all zone text fields before uninstalling the feature. Otherwise, you will not be able to delete the zone text fields from the content server unless you reinstall the feature to disable the zone text fields or drop the indexes for the zone text fields from the database manually.

3.4.4.3 Changing the MinTextFullFieldLength Variable

By default, text fields with a field length of 20 characters or less are not included in the Text Fields list. You can change this setting by modifying the MinFullTextFieldLength configuration variable. To change this variable, complete the following steps:

Using a text editor, open the config.cfg file located in the IntradocDir/config/ directory.
Add the MinFullTextFieldLength configuration variable, and set its value (the default value is 21). For example:
```
MinFullTextFieldLength=16
```
Save your changes to the config.cfg file.
Restart the content server.

3.4.4.4 Disabling Database Search Contains Operator

Before disabling the feature, you might want to disable all zone text fields. The database contains an index for each enabled zone text field (the indexes are dropped when the zone text fields are disabled). If the database contains an index for a field, it will not let you delete the field from your content server using Configuration Manager. For more information, see "Enabling and Disabling Zone Text Fields".

If you disable the feature and later want to delete a field that is enabled as a zone text field, you can use one of the following options:

Reinstall the feature, disable the zone text field, use Configuration Manager to delete the field, and uninstall the feature.
Log in to the database and issue a command to drop the index for the field, and then use Configuration Manager to delete the field.

3.4.5 Indexing with Databases

If your system is set up to provide indexing and searching capabilities with databases, your system integrator would have added one of the following lines in IntradocDir/config/config.cfg:

Metadata Searching Only:
```
SearchIndexerEngineName=DATABASE.METADATA
```
DATABASE.METADATA is supported in all databases supported by Oracle Fusion Middleware 11g Release 1 (11.1.1).
Full-text Searching:
```
SearchIndexerEngineName=ORACLETEXTSEARCH
```
ORACLETEXTSEARCH is supported in Oracle Database version 11.1.0.7 and above.
Full-text Searching:
```
SearchIndexerEngineName=DATABASE.FULLTEXT
```
DATABASE.FULLTEXT is supported in SQL Server, and in Oracle Database (all supported versions).

The dbfulltextsearch script appropriate for the supported database would then be run.

By default, full-text indexing is applied to all converted files.
By default, the content server full-text indexes files that are passed through or converted to any of the following formats:

Oracle Supported Formats
- pdf
- html
- htm
- xls
- hcsp
- text
- txt
- doc
- rtf
- ppt
MS SQL Supported Formats
- text
- txt
- htm
- html
- doc
- msword
- ms-word
- ms-powerpoint
- ppt
- ms-excel
- xls
For example, if you want to convert your Microsoft Word (.doc) files to text files instead of PDF, you can specify this in the Configuration Manager. That is, when you use the File Formats option to map the .doc file extension to a text format, then this defines how the file is converted to a Web viewable format. In this case, the text file is fully indexed before it is passed to the Web site.

For more information about the Configuration Manager's File Formats option, see the Oracle Fusion Middleware Application Administrator's Guide for Content Server.
You can enable contributors to specify whether to full-text index a file by enabling the format override feature in System Properties. (See "Configuring General Options".)

For example, if you have used the Configuration Manager's File Formats option to map Corel WordPerfect (.wpd) files to use a text format and a contributor selects the use default option in the Format field on the checkin page, the file will be converted to text and full-text indexed. If the contributor selects Corel WordPerfect Document, the file will be passed through in its native format and will not be full-text indexed.

For more information about the Configuration Manager's File Formats option, see the Oracle Content Server Application Administrator's Guide for Content Server.
When you use full-text searching, a search is case sensitive for metadata, and case insensitive for full text. For Content ID, however, lower case letters are converted to upper case, so Content ID can not be searched with lower case letters.

3.4.5.1 Database-Supported File Formats

If you define a file format to PASSTHRU in the native format, and the format name contains one of the types listed above (such as application/ms-excel.native), the passed through native file will be full-text indexed by default.

Alternatively, you can use configuration variables to control whether a document is full-text indexed. To manage the full-text indexing and search of specific document format types, add applicable entries to IntradocDir/config/config.cfg, and save the file. Full-text indexing configuration variables include:

FormatMap
ExceptionFormatMap

3.4.5.1.1 FormatMap

The FormatMap configuration variable controls whether files of a specific format should be included in the full-text search index. It is a comma-delimited list of all the formats that will be full-text indexed. The decision is made by taking the MIME type assigned to a file, splitting the MIME type apart at any slash (/) or period (.), and then checking if that value is in the FormatMap list.

For example, application/vnd.msword will turn into a list of three items:

application
vnd
msword

If FormatMap has msword in its list, then the indexer engine will attempt to full-text index the file. the comparison test is not case sensitive.

3.4.5.1.2 ExceptionFormatMap

The ExceptionFormatMap configuration variable is used to exclude document formats from the FormatMap test. Any format that satisfies the ExceptionFormatMap test will not be full-text indexed. This test is done after splitting the MIME format at slashes (/), but not periods(.). For example, if msword is included in the exceptions list, then the MIME format application/msword is excluded but not application/vnd.mssword.

3.4.6 Searching Content Using the Oracle Query Optimizer Component

The Oracle Query Optimizer component is installed (enabled) by default with Content Server. The functionality only works with the Oracle database.

This section covers these topics:

"About The Oracle Query Optimizer Component"
"Query Optimization Process"
"How Reformatted Queries Optimize Searches"
"Types of Recognized Hints"
"Query Hints Syntax"
"Additional Supported Sort Constructs"
"The Hint Rules Table"
"The Hint Cache"

"Using Hint Rules"
"Adding and Enabling New Hint Rules"
"Editing Existing Hint Rules"
"Disabling Hint Rules"
"Enabling Hint Rules"
"Removing Hint Rules"
"Using the Query Converter"
"Converting a Data Source"
"Editing a Converted Data Source or Query"
"Updating the Hint Cache"
"Checking the Hint Cache from a Data Source"
"Modifying an Existing Hint Cache Query Using Data Source"
"Modifying an Existing Hint Cache Using a Query"
"Removing a Hint Cache Data Source Entry"
"Removing a Hint Cache Query"

3.4.6.1 About The Oracle Query Optimizer Component

Oracle database does not automatically select the best execution plan for certain types of user queries. To counter this, the Oracle Query Optimizer adds hints to queries that force Oracle database to perform searches more efficiently.

The hints are based on an intrinsic knowledge of Content Server's table data distribution and its index selectivity. To take advantage of this knowledge, Oracle Query Optimizer uses a pre-defined hint rules table to analyze the database query and then add appropriate hints to the query. In turn, the added hints improve Oracle's search performance.

Oracle Query Optimizer takes advantage of Content Server's data distribution in database tables and its index selection preferences. Based on these characteristics, the hint rules table included with Oracle Query Optimizer contains pre-defined rules. The feature uses these rules to analyze a database query and to add one or more appropriate hints to the query to optimize the search performance.

In very large collections containing millions of content items, Content Server generally has a difficult time selecting an appropriate optimization strategy to resolve even simple queries. To counteract this problem, Oracle Query Optimizer examines the submitted query and, based on its analysis, reformats the query by adding appropriate hints to optimize the search process. To add hints, the feature uses Content Server hints, The Hint Rules Table, and The Hint Cache.

3.4.6.2 Query Optimization Process

The stages of the optimization process are completed in the following sequence:

The submitted query is analyzed to verify if it contains one or more hints and, if so, determine the type of hint; see "Stage 1: Query Analysis".
If the query's WHERE clause does not contain a hint, the optimization feature must parse out the WHERE clause; see "Stage 2: Parsing".
After parsing, each condition in the query's WHERE clause is evaluated against the hint rules table in an attempt to qualify the condition and normalize the query; see "Stage 3: Normalization".
After the WHERE clause conditions are qualified and the query is normalized, a hint is selected or retrieved from the hint cache; see "Stage 4: Select Hint".
The query is reformatted using the selected hint; see "Stage 5: Reformat Query".

Figure 3-2 Optimization Process Sequence

Description of "Figure 3-2 Optimization Process Sequence"

3.4.6.2.1 Stage 1: Query Analysis

In this stage, a query is checked for both Oracle (native) and Content Server hints. This is determined based on the hint syntax: Query Hints Syntax. A query that contains Oracle hints is passed through. A query that contains Content Server hints bypasses Stage 2: Parsing and Stage 3: Normalization. If a query contains multiple Content Server hints, the best hint is chosen. Queries that do not contain any hints must be parsed and normalized.

3.4.6.2.2 Stage 2: Parsing

In this stage, a query that does not contain any hints is sent through the query parser and the WHERE clause is parsed out. A WHERE clause consists of one or more conditions joined with either AND or OR conjunctions. For each condition, the field name, operator, and field value are extracted. The AND/OR conjunctions of the clause are preserved; the parentheses are dropped. Conditions must use the following format:

fieldname operator value

For example, a properly formatted condition would be dID = 3. An incorrect condition would be 3 = dID.

3.4.6.2.3 Stage 3: Normalization

In this stage, normalization simplifies conditions, finalizes query operators, and provides a stable view of the WHERE clause for additional steps. The result of the normalization process produces a base for generating the cache key and the list of fields to use to search for hints.

Note:

To establish which database tables and columns have indexes, the hint rules table is defined on Content Server resources and on the running system.

Qualifying WHERE Clause Conditions:

Each condition in the WHERE clause is checked against The Hint Rules Table. If a condition's field name is included in the hint rules table, then it is qualified and the condition is considered to be normalized. The condition contains its table name and alias. Then the normalized conditions are sorted to ensure that the same set of conditions is always listed consistently.
Discarding WHERE Clause Conditions During Normalization:

During normalization, the following conditions are not considered relevant and are eliminated from further processing:
- Join conditions.
- Conditions that contain subqueries.
- Conditions whose field names do not have entries in the hint rules table and cannot be qualified.
- OR conditions that contain more than one field. For example:
```
(dSecurityGroup = 'Secure' or dDocAccount LIKE 'prj%')
```
- Conditions that contain the LIKE operator whose value begins with a wildcard.
Reformatting WHERE Clause Conditions:

In the normalization step, the query conditions are rewritten to consolidate complex query conditions. OR conditions are reevaluated as follows:
- If all the fields are the same and all the operators are equal (or all the operators are LIKE and no values begin with a wildcard), the conditions are combined and changed to an IN query.
- If the fields are the same but have different operators, the conditions are combined and the generic operator is assigned.
- If the fields are different, the conditions are dropped.
For example, during normalization, the following condition is reformatted:
```
(dReleaseState = 'Y' OR dReleaseState = 'O")
```
It is reformatted as follows:
```
dReleaseState IN ('Y', 'O')
```
Finding Potential Range Queries:

The parsed query is analyzed to find potential range queries that are then consolidated during the normalization process. For example, the conditions dIndate > date1 and dInDate < date2 are changed to one condition with the operator 'range'.

3.4.6.2.4 Stage 4: Select Hint

In this stage, the normalized conditions are checked against the hint cache. If one or more conditions have applicable hints in the cache, they are included. If applicable hints are not found in the cache, the conditions are analyzed and the preference orders are compared to determine the best possible hint.

3.4.6.2.5 Stage 5: Reformat Query

In this stage, the query is reformatted by adding in the selected hint. For more information about how reformatting queries with hints helps to optimize searches and some examples of reformatted queries, see "How Reformatted Queries Optimize Searches".

3.4.6.3 How Reformatted Queries Optimize Searches

The majority of queries in Content Server involve a small, targeted set of content items or return a hundred rows, at most. Content Server can easily scale to millions of content items. However, testing on an Oracle database with a collection containing 10 million content items indicates that the execution plan that Oracle selects is not the most efficient. Oracle generally does not choose the best optimization strategies to resolve many queries, even some that are trivial. The following examples explain this issue:

"Example 1: Reformatting a Query by Adding a Single Hint"
"Example 2: Reformatting a Query by Adding Multiple Hints"

3.4.6.3.1 Example 1: Reformatting a Query by Adding a Single Hint

In the environment described above, Oracle does not resolve the following query as efficiently as possible:

SELECT *
FROM Revisions, Documents, DocMeta
WHERE Revisions.dID = Documents.dID
    AND Revisions.dID = DocMeta.dID
    AND Revisions.dRevClassID = 333
Order By Revisions.dID

Because a fairly selective index is available (dRevClassID_2 for Revisions.dRevClassID), this query should access dRevClassID_2 and perform a sort on the rows that match the dRevClassID. However, in this query example, Oracle chooses to use the Revisions.dID index.

This choice is actually worse than performing a full table scan on the Revisions table because it does a full index scan and accesses the table to obtain the dRevClassID for each row. Obviously, resolving the query using this execution plan does not work well when the Content Server has over 10 million content items. In this case, it requires approximately 500 seconds to return the results.

However, the performance improves dramatically when the query is modified by adding a hint as follows:

SELECT /*+ INDEX(Revisions dRevClassID_2)*/ *
FROM Revisions, Documents, DocMeta
WHERE Revisions.dID = Documents.dID
    AND Revisions.dID = DocMeta.dID
    AND Revisions.dRevClassID = 333
Order By Revisions.dID

The query is modified by adding the following hint to the SELECT clause:

/*+ INDEX(Revisions dRevClassID_2)*/

This forces Oracle database to choose the dRevClassID_2 index instead of the index for Revisions.dID. Because no more than a few content items share dRevClassID in this example, the modified query returns the results instantly.

3.4.6.3.2 Example 2: Reformatting a Query by Adding Multiple Hints

In a typical content server instance, most documents have a 'Y' (released) status for the dReleaseState with a dInDate earlier than the current date. However, only a few documents have an 'N' (new, not yet indexed) status for the dReleaseState. The following query is searching for content items that have not yet been released:

SELECT dID
FROM Revisions
WHERE Revisions.dReleaseState = N'N' AND Revisions.dStatus in
    (N'DONE', N'RELEASED', N'DELETED')
    AND Revisions.dInDate<={ts '2005-02-23 17:46:38.321'}

The optimized result for the query uses the index for dReleaseState:

SELECT/*+ LEADING(Revisions) INDEX (Revisions dReleaseState)*/
    dID
FROM Revisions 
WHERE Revisions.dReleaseState = N'N' AND Revisions.dStatus in
    (N'DONE', N'RELEASED', N'DELETED')
    AND Revisions.dInDate<={ts '2005-02-23 17:46:38.321'}

3.4.6.4 Types of Recognized Hints

Content Server queries can be static queries defined in various resources, data sources with additional dynamic WHERE clauses, and dynamic queries that are ad-hoc or defined in the application such as Archiver. Static queries can be updated with Oracle database hints. However, it is nearly impossible to predefine hints for ad-hoc queries and dynamic WHERE clauses.

Content Server hints use a database-neutral hint syntax that supports multiple hints in the same query. A Content Server hint can be used in any query, data source, and WHERE clause. However, it cannot be combined with an Oracle database hint. If a query contains both types of hints, Oracle Query Optimizer will retain the Oracle database hint and ignore the Content Server hint.

3.4.6.5 Query Hints Syntax

During the optimization processing stages, Oracle Query Optimizer recognizes the distinct syntaxes of both types of hints and correspondingly processes the submitted query. For more detailed information, see the "Query Optimization Process".

3.4.6.5.1 Oracle Hint Syntax

An Oracle hint uses the following format:

/*+ hint */

For example:

/*+ Index(Revisions dID)*/

3.4.6.5.2 Content Server Hint Syntax

The Content Server hint syntax is database neutral and can support multiple Content Server hints in the same query. During the optimization process, Content Server hints are evaluated and the best hints are formatted and added back to the query.

During the optimization process, a query that includes one or more Content Server hints is not parsed. Only Content Server hints are considered when choosing indexes.

Content Server Hint Syntax:

When a query undergoes the optimization process, Content Server hints are added to the reformatted query using the following syntax:
```
/*$tableName[ aliasName]:columnName[:operator [:<value>]][, …]*/ 
```
Where:
- Values enclosed in angle brackets (<value>) are required.
- Values enclosed in brackets ([value]) are optional.
- Ellipses (…) indicates a repetition of the previous expression(s).

Query Before Optimization Process:

SELECT * 
FROM Revisions, DocTypes, RoleDefinition 
WHERE /*$Revisions:dStatus*/(Revisions.dStatus<>'DELETED' AND Revisions.dStatus<>'EXPIRED' AND Revisions.dStatus<>'RELEASED') AND Revisions.dDocType = DocTypes.dDocType AND /*$Revisions:dReleaseState*/Revisions.dReleaseState<>'E' AND (Revisions.dSecurityGroup = RoleDefinition.dGroupName AND RoleDefinition.dRoleName = ? AND RoleDefinition.dPrivilege > 0)

Reformatted Query with Content Server Hints Added:

After the query has undergone the optimization process, both indexes are used and are added to the native indexes.

SELECT/*+ LEADING(revisions) INDEX (revisions dStatus dReleaseState)*/ * 
FROM Revisions, DocTypes, RoleDefinition 
WHERE (Revisions.dStatus<>'DELETED' AND Revisions.dStatus<>'EXPIRED' AND Revisions.dStatus<>'RELEASED') AND Revisions.dDocType = DocTypes.dDocType AND Revisions.dReleaseState<>'E' AND (Revisions.dSecurityGroup = RoleDefinition.dGroupName AND RoleDefinition.dRoleName = ? AND RoleDefinition.dPrivilege > 0)

3.4.6.6 Additional Supported Sort Constructs

Using Oracle sort constructs in search query clauses allows users greater flexibility when performing a query. Sort constructs specify the row data in two or more tables to be extracted, sorted, and combined. Essentially, the sort constructs serve the purpose of limiting the number of rows that are returned. Oracle Query Optimizer recognizes the following sort constructs:

Group by: Sorts a set of records and specifies how to group the results.
Order by: Sorts a set of records and specifies whether the results are to be returned in ascending or descending order.
Inner join: Sorts a set of records by looking for and returning those that match.
Outer join: Sorts a set of records by looking for and returning those that do not match.

3.4.6.7 The Hint Rules Table

The hint rules table contains the rules that the optimization feature uses to determine the proper hints to add to dynamic queries or data sources during the optimization process. Using the Hint Rule Editor, a hint rule can be defined for a particular field and operator. A hint rule can also be defined based on values or date/number ranges. The hint rule table is extensible by other components, and can be updated while the content server is running.

Figure 3-3 Example Hint Rules Table

Description of "Figure 3-3 Example Hint Rules Table"

Several default hint rules included with Oracle Query Optimizer are described in the following text. For more detailed descriptions of the table columns, see "Hint Rules Table Column Descriptions". The content of the hint rules table is available on the Hint Rules Configuration Page that is accessed through the Administration tray.

The hint rules table is scheduled to reload every night, and when a rule is added or modified. The hint value is recalculated at each reload.

Important:

Although the hint rules table includes a column allowing multiple indexes to be used with each other, in Oracle only the bitmap index can be combined. This is because the hint rules table was designed for core Content Server functionality.

Therefore, it might not be sufficient for a system with components that create additional tables or add additional metadata fields, or both. However, the hint rules table can be extended or overwritten by other components to provide knowledge of additional tables, indexes and fields.

Explanation of First Hint Rule:

For this rule, if the WHERE clause contains the following condition the PK_Revisions index is used and added as a hint to the optimized query:
```
Revisions.dID = some_value
```
Explanation of Second Hint Rule:

For this rule, if the WHERE clause contains either of the following conditions the dDocName index is used and added as a hint to the optimized query:
```
Revisions.dDocName = some_value
Revisions.dDocName LIKE 'some_value'
```
Explanation of Third Hint Rule:

For this rule, if the WHERE clause contains the following condition the condition does not meet the requirements and cannot be qualified:
```
dStatus = 'DONE'
```
However, if the WHERE clause contains the following condition the dStatus index is used and added as a hint to the optimized query.:
```
dStatus = 'RELEASED'
```

3.4.6.8 Hint Rules Table Column Descriptions

This section describes the following columns in the hint rules table:

Key
Table
Column
Operators
Index
Order
Values
AllowMultiple
Disabled

3.4.6.8.1 Key

This column contains the unique name to identify the rule. A component can use the unique key to overwrite a particular rule. This key is usually identical to its index name because the index name is unique in the same database schema.

By default, Oracle uses a B+ Tree (binary tree) as the indexing structure to provide efficient access to logical records. B+ Tree indexes are most useful for queries involving a small number of result rows or when the user needs to execute queries using varying criteria (such as equality and range conditions). Because B+ Tree indexes store the indexed data values, these indexes are useful as sources of data if the requested value is the stored value.

However, bitmapped indexes offer substantial performance improvements with minimal storage cost compared to the default B+ Tree indexes. Bitmapped indexes are particularly effective for searching columns with poor selectivity due to having very few distinct values. Also, a bitmap is built for each value including the NULL value (which means the NULL is indexed). Overall, using bitmapped indexes is very efficient because the index lookup process is a bit-level operation and allows access to multiple indexes.

Note:

Because hint rules can be overwritten, Oracle Query Optimizer does not allow you to add a hint rule using an existing key. Therefore, it is important when you are creating your bitmapped indexes for columns that you assign unique keys.

Oracle recommends that you use bitmapped indexes for the table columns listed below, and set the index name to the corresponding column name.

Revisions table:
- dIndexerState
- dReleaseState
- dProcessingState
- dIsCheckedOut
- dSecurityGroup
- dStatus
WorkflowDocuments table:
- dWfDocState

3.4.6.8.2 Table

This column identifies the specific database table.

3.4.6.8.3 Column

This column identifies the specific column within the database table listed in the Table column.

3.4.6.8.4 Operators

This column is a comma-delimited list of allowable operators. See the Operators field and menu on the Edit Query Hint Rules Table for more information about the valid operator options. The hint rule's operator is important in the decision of whether a hint rule will be applied to a condition.

For example, if the WHERE clause contains the following condition using the PK_Revisions index would be a very valuable hint to include in an optimized query:

Revisions.dID = 3

However, if the WHERE clause contains the following condition then using the PK_Revisions index would not be useful:

Revisions.dID > 3

3.4.6.8.5 Index

This column identifies the specific index to use in the optimized query if the condition meets the hint rule requirements.

3.4.6.8.6 Order

This column contains the preferred order to use when the rule is included in the hint rules table. The highest ordered rules in a query are given precedence when deciding which hint to use.

The order values include:

5: This value indicates that the specified index is unique or does not match more than 50 rows for any value. For example, specifying dID with the Revisions, Documents, or DocMeta tables.
4: This value indicates that the specified index should be somewhat less selective. The specified value should typically match a few rows and, at the very most, several hundred rows. For example, specifying dDocTitle with the Revisions table.
3: This value indicates that the specified index matches less than a thousand rows. For example, specifying dInDate or dOutDate.
2: This value indicates that the specified index matches less than ten thousand rows.
1: This value indicates that the specified index matches more than ten thousand rows.

3.4.6.8.7 Values

This column is Idoc scriptable. This column can only be defined when the Operators column value is one of the following:

in or notIn: When you use either of these operators, the value should be a comma-delimited list enclosed in parenthesis.

range: When you use this operator, the value must use one of the following formats:

Format 1:

([<lowValue>],range[,<highDateValue>])

Examples of acceptable values include:

('Y', 'O')
(,7d)
({ts '2004-12-11 12:03:23.000'}, 2d, <$dateCurrent()$>)

Format 2:

#[d|h]

For example, a range of five days is 5d and seven hours is 7h.

Tip:

The operators in or notIn can substitute for the operators equal and notEqual, respectively, along with their matching values. For more information about operator options, see the Operators field and menu on the Hint Rule Editor.

The following use cases demonstrate how this column provides additional flexibility to the hint rules:

Use Case 1: State or Status Table Columns

Table columns that indicate a state or status such as dReleaseState or dStatus are biased regarding the finished states. For example, dReleaseState is predisposed for 'Y' (released) or 'O' (old version). Likewise, dStatus is predisposed for RELEASED. Therefore, in WHERE clauses, conditions such as dReleaseState = Y or dStatus = RELEASED match the majority of rows in the Revisions table. Thus, indexes for these two columns are almost useless. Conversely, the condition dReleaseState = N (new, not yet indexed) matches only a few rows. Consequently, indexes on this column would be very helpful.
Use Case 2: Date or Number Table Columns

Table columns that indicate a date or number exhibit similar behavior to state or status. For example, the condition dInDate < <$dateCurrent()$> matches most of the table rows and makes indexes on this field irrelevant. However, the combined conditions dInDate < <$dateCurrent()$> AND dInDate > <$dateCurrent(-1)$> usually match only a small set of rows and would benefit from using the corresponding index as a hint.

3.4.6.8.8 AllowMultiple

This column indicates whether the defined index is used with other indexes. In Oracle, only the bitmap index can be combined.

3.4.6.8.9 Disabled

This column indicates whether a hint rule has been disabled. Any rule in the table can be enabled/disabled. If you disable a hint rule, a value of 'Y' is displayed. Existing rules can be disabled to match the current Content Server state.

For example, if a Content Server instance contains only a few distinct content revClasses, each revClass may have thousands of revisions. Therefore, the dRevClass_2 index is not very effective. In this case, this corresponding hint rule should be disabled and you should add one or more new rules with different preference orders.

Note:

Although any rule in the table can be enabled/disabled, only the rules that are added using the Hint Rule Editor can be removed. The default hint rules that are included with the Oracle Query Optimization feature can only be disabled; they cannot be removed.

3.4.6.9 Hint Rule Editor

The Hint Rule Editor provides a way to add, remove, enable, or disable rules using the Edit Query Hint Rules Table. You can add a new rule to reflect new tables and indexes. Existing rules can be removed or disabled to match the current state of Content Server. If you select a hint rule from the hint rule table, the Hint Rule Editor fields are automatically populated with the applicable values.

The Edit Query Hint Rules Table is accessed by clicking one of the Show hint rule editor toggles on the Hint Rules Configuration Page and is displayed below the hint rules configuration table.

The hint rules configuration table is scheduled to reload every night and whenever a new rule is added or an existing rule is modified. The hint value is recalculated at each reload.

Although any rule in the table can be enabled/disabled, only the rules that are added through the Hint Rule Editor can be removed. The default hint rules that are included with the Oracle Query Optimizer component can only be disabled; they cannot be removed.

3.4.6.10 The Hint Cache

Oracle Query Optimizer also contains a hint cache to store dynamically generated hints. For example, a hint derived from a parsed query or data source is cached to maintain persistence. In this way, the hint cache provides stability for queries and data sources.

The hint cache is used during the optimization process to select hints for queries that do not contain Oracle or Content Server hints. The hint cache provides a mechanism to fine tune query hints. In addition, administrator can check/edit cache and change hint for queries at run time.

The hint cache is stored to disk every two hours and is reloaded when the content server instance is started.

The characteristics of the hint cache include:

Reusing Hint Cache Entries
Hint Cache Management
Default Capacity Algorithm
Origin of Hint Cache Keys
Hint Cache Persistence

3.4.6.10.1 Reusing Hint Cache Entries

The same query matches the same cache entry regardless of its values unless the new value does not satisfy the hint rule conditions. Two examples are included below to demonstrate how the same hint cache entry can and cannot be used for multiple queries.

Example 1: Using Similar Hint Cache Entries

In the following two queries, the same hint cache entry is used because both queries match the hint rule requirements.

QueryA:

SELECT * 
FROM Revisions
WHERE dDocName = 'name1'

QueryB:

SELECT * 
FROM Revisions
WHERE dDocName = 'name2'

Example 2: Using Different Hint Cache Entries

In the following two queries, the same hint cache entry cannot be used because QueryB violates the requirements for the dReleaseState hint rule. The dReleaseState hint rule requires that the dReleaseState values are neither Y (released) nor O (old revision).

QueryA:

SELECT * 
FROM Revisions
WHERE dReleaseState = 'U' AND dStatus = 'DONE'

QueryB:

SELECT * 
FROM Revisions
WHERE dReleaseState = 'Y' AND dStatus = 'DONE'

3.4.6.10.2 Hint Cache Management

In the hint cache, you can add a new entry, edit an existing entry, or remove an existing entry using the Hint Cache Updater Page. When adding or editing hint cache entries, you must use the Content Server Hint Syntax. The ability to manage the hint cache is very useful for fine tuning query hints. The example below demonstrates the benefits of fine tuning a hint cache entry.

Example: Batchloading Unindexed Content

If you have just batchloaded 100K content items into the Content Server and they are not yet indexed, the index-based query used above (Example 2: Using Different Hint Cache Entries) would match all of the batchloaded documents.

QueryA:

If most of the batchloaded documents have not been indexed, the dReleaseState index that is used in this query is not the best choice. For the best results in this case, you should fine tune the hint cache entry to use both the dReleaseState and the dStatus indexes. Use the Hint Cache Updater Page to update hint cache entries.
```
SELECT dID
FROM Revisions
WHERE Revisions.dReleaseState = N'N' AND Revisions.dStatus in (N'DONE', N'RELEASED', N'DELETED') AND Revisions.dInDate<={ts '2005-02-23 17:46:38.321'}
```

QueryB:

After updating the hint cache entry, the new optimized query is:

SELECT/*+ LEADING(revisions) INDEX (revisions dReleaseState dStatus)*/ dID
FROM Revisions
WHERE Revisions.dReleaseState = N'N' AND Revisions.dStatus in (N'DONE', N'RELEASED', N'DELETED') AND Revisions.dInDate<={ts '2005-02-23 17:46:38.321'}

3.4.6.10.3 Default Capacity Algorithm

By default, the hint cache has a maximum capacity of 1000 hints. The hint cache uses the midpoint insertion least-recently-used (LRU) algorithm which is similar to the one used by Oracle and mySQL. A new entry is inserted into the middle of the queue and each subsequent execution moves the entry up one spot.

When the number of hints in the cache exceed the maximum capacity, the entry at the bottom of the queue is removed from the cache. Thus, the LRU algorithm ensures that the most recently executed query hints are in the upper levels of the queue.

3.4.6.10.4 Origin of Hint Cache Keys

The hint cache key is generated from the normalized query; see "Stage 3: Normalization". It consists of the qualified columns (columns that are qualified by table/alias names) and columns that have a hint rule defined. The cache key excludes conditions that contain joins or subqueries.

The following example illustrates how the cache key is generated from a given query:

SELECT DocMeta.*, Documents.*, Revisions.* 
FROM DocMeta, Documents, Revisions 
WHERE DocMeta.dID = Revisions.dID AND Revisions.dID=Documents.dID AND Revisions.dDocName='abc' AND Revisions.dStatus<>'DELETED' AND (Revisions.dReleaseState='U' OR Revisions.dReleaseState='I' OR Revisions.dReleaseState='Y') AND Documents.dIsPrimary<>0

The generated cache key is as follows:

documents.disprimary:notequal:documents|revisions.ddocname:equal:revisions|revisions.dreleasestate:in:revisions|revisions.dstatus:notequal:revisions

3.4.6.10.5 Hint Cache Persistence

The hint cache is designed to be persistent. To ensure the persistence, the hint cache is saved to the file system every two hours. The persisted hint cache is reloaded when the content server instance is started.

3.4.6.11 Using Hint Rules

The following tasks are involved in using hint rules:

Adding and Enabling New Hint Rules
Editing Existing Hint Rules
Disabling Hint Rules
Enabling Hint Rules
Removing Hint Rules

To access the Hint Rules Configuration page:

Open the Administration tray.
Click the Hint Rules Configuration link.

The Hint Rules Configuration Page is displayed.
Click the Show hint rule editor toggle switch on the Hint Rules Configuration page.

The Edit Query Hint Rules Table is displayed.

3.4.6.11.1 Adding and Enabling New Hint Rules

To add a new hint rule to the hint rules table:

Click the Show hint rule editor toggle switch on the Hint Rules Configuration page.

The Edit Query Hint Rules Table is displayed.
Complete the fields as desired. For more detailed explanations of each field, see "The Hint Rules Table" and the "Edit Query Hint Rules Table".
Click the Add button.

The new hint rule is added to the hint rules table and is effective immediately.

3.4.6.11.2 Editing Existing Hint Rules

To edit an existing hint rule in the hint rules table:

Select the desired hint rule in the hint rules table.

The Edit Query Hint Rules Table is displayed and all of the applicable fields are populated with the hint rule's values.
Edit the fields as desired. For more detailed explanations of each field, see "The Hint Rules Table" and the "Edit Query Hint Rules Table".
Change the key.
Click the Add button.

The hint rules table is refreshed and the new hint rule is added. The modifications are effective immediately.
Delete the old hint rule.

3.4.6.11.3 Disabling Hint Rules

Although any rule in the table can be enabled/disabled, only the rules that are added through the Edit Query Hint Rules Table can be removed. The default hint rules that are included with the Oracle Query Optimization feature can only be disabled; they cannot be removed.

To disable a hint rule in the hint rules table:

Select the desired hint rule in the hint rules table.

The Edit Query Hint Rules Table is displayed and all of the applicable fields are populated with the hint rule's values.
Click the Disable button.

The hint rules table is refreshed and 'Y' is displayed in the Disabled column indicating that the hint rule is deactivated.

3.4.6.11.4 Enabling Hint Rules

To enable a disabled hint rule in the hint rules table:

Select the desired hint rule in the hint rules table.

The Edit Query Hint Rules Table is displayed and all of the applicable fields are populated with the hint rule's values.
Click the Enable button.

The hint rules table is refreshed and the Disabled column is clear indicating that the hint rule is reactivated.

3.4.6.11.5 Removing Hint Rules

To delete a hint rule from the hint rules table:

Select the desired hint rule in the hint rules table.

The Edit Query Hint Rules Table is displayed and all of the applicable fields are populated with the hint rule's values.
Ensure that the hint rule is enabled. If the hint rule is disabled it cannot be removed. To reactivate a disabled hint rule, see "Enabling Hint Rules".
Click the Remove button.

The hint rules table is refreshed and the selected hint rule is removed.

3.4.6.12 Using the Query Converter

The following tasks are involved when you use the Query Converter:

Converting a Data Source
Converting a Query
Editing a Converted Data Source or Query

To access the Query Converter page, select Administration, then Oracle Query Optimizer, then click Query Converter.

3.4.6.12.1 Converting a Data Source

To convert a data source query.

If applicable, select the Use Data Source check box.

The data source-related fields are displayed on the Query Converter page.
Select the desired data source from the DS Name menu.

The data source query is displayed in below the DS Name field.
Enter the applicable information for additional parameters and WHERE clauses.
Click Convert Query.

The data source is converted and the results are displayed in a text area above the Use Data Source check box. To view an example of a converted data source query, see Figure 3-4.

3.4.6.12.2 Converting a Query

To convert a query:

If applicable, clear the Use Data Source check box.

The data source-related fields are hidden from the Query Converter page.
Enter the applicable information for the query.
Click Convert Query.

The query is converted and the results are displayed in a text area above the Use Data Source check box. To view an example of a converted query, see Figure 3-5.

Figure 3-4 Example of Converted Data Source Screen

Description of "Figure 3-4 Example of Converted Data Source Screen"

Figure 3-5 Example of Converted Query Screen

Description of "Figure 3-5 Example of Converted Query Screen"

3.4.6.12.3 Editing a Converted Data Source or Query

After the data source or query is converted, the results are displayed above the Use Data Source check box. Because the conversion process clears the fields, the converted query can only be modified by entering new information in the fields. To edit information for a data source or query, see the applicable sections in "Converting a Data Source".

3.4.6.13 Updating the Hint Cache

The following tasks are involved when updating the hint cache:

Accessing the Hint Cache Updater Page
Checking the Hint Cache from a Data Source
Checking from a Query
Modifying an Existing Hint Cache Query Using Data Source
Modifying an Existing Hint Cache Using a Query
Removing a Hint Cache Data Source Entry
Removing a Hint Cache Query

3.4.6.13.1 Accessing the Hint Cache Updater Page

To access the Hint Cache Updater page, select Administration, then Oracle Query Optimizer, then click Hint Cache Updater.

3.4.6.13.2 Checking the Hint Cache from a Data Source

To check the hint cache using a data source:

On the Hint Cache Update page, select the Use Data Source check box.

The data source-related fields are displayed on the Hint Cache Updater page.
Select the desired data source from the DS Name menu.

The data source query is displayed in below the DS Name field.
Enter the applicable information for the additional parameters, WHERE clause, and hints.
Click Check Cache.

The results are displayed above the Use Data Source check box. To view an example of an unsuccessful hint search, see Figure 3-6.

Figure 3-6 Example of Hint Cache Updater Results with Data Source

Description of "Figure 3-6 Example of Hint Cache Updater Results with Data Source"

3.4.6.13.3 Checking from a Query

To check the hint cache using a query:

On the Hint Cache Update page, ensure the Use Data Source check box is clear.

The data source-related fields are hidden from the Query Converter page.
Enter the applicable information.
Click Check Cache.

The results are displayed above the Use Data Source check box. To view an example of an unsuccessful hint search, see Figure 3-7. To view an example of a successful hint search, see Figure 3-8.

Figure 3-7 Example of Hint Cache Updater Results without Data Source

Description of "Figure 3-7 Example of Hint Cache Updater Results without Data Source"

Figure 3-8 Hint found in hint cache

Description of "Figure 3-8 Hint found in hint cache"

3.4.6.13.4 Modifying an Existing Hint Cache Query Using Data Source

To modify a hint cache query using a data source:

On the Hint Cache Updater page, select the Use Data Source check box.

The data source-related fields are displayed on the Hint Cache Updater page.
Select the desired data source from the DS Name menu.

The data source query is displayed in below the DS Name field.
Enter the applicable information for the additional parameters, WHERE clause, and hints.
Click Update Cache to overwrite the previous hint cache.

The results are displayed in a text box above the Use Data Source check box. To see an example of successfully adding a new hint to a query and updating the hint cache, see the screen capture included in this section.

3.4.6.13.5 Modifying an Existing Hint Cache Using a Query

To modify a hint cache using a query:

On the Hint Cache Page, ensure that the Use Data Source check box is clear.

The data source-related fields are hidden from the Query Converter page.
Enter the applicable information.
Click Update Cache to overwrite the previous hint cache.

The results are displayed above the Use Data Source check box. In the screen capture note that the new hint was added and the hint cache was updated.

Figure 3-9 New hint added, hint cache updated

Description of "Figure 3-9 New hint added, hint cache updated"

3.4.6.13.6 Removing a Hint Cache Data Source Entry

To remove a hint cache data source entry:

On the Hint Cache Updater page, select the Use Data Source check box.

The data source-related fields are displayed on the Hint Cache Updater page.
Select the desired data source from the DS Name menu.

The data source query is displayed below the DS Name field.
Enter the applicable information for the additional parameters, WHERE clause, and hints.
Click Remove.

The information entered into the fields is removed. To see an example of successfully removing a hint from a query and the hint cache, see the screen capture included in this section.

3.4.6.13.7 Removing a Hint Cache Query

To remove a hint cache query:

On the Hint Cache Updater page, ensure that the Use Data Source check box is clear.

The data source-related fields are hidden from the Query Converter page.
Enter the applicable information for the query and hints.
Click Remove.

The results are displayed above the Use Data Source check box. In the screen capture note that the previously added hint was deleted from the query and hint cache.

Figure 3-10 Example of a deleted hint from cache

Description of "Figure 3-10 Example of a deleted hint from cache"