Skip Headers
Oracle® Fusion Middleware System Administrator's Guide for Content Server
11g Release 1 (11.1.1)
E10792-01
  Go To Documentation Library
Library
Go To Table Of Contents
Contents
Go To Index
Index

Previous
Previous
 
Next
Next
 

6.1 OracleTextSearch

If you have a license to use OracleTextSearch (in Oracle Database 11g), the OracleTextSearch component enables the use of this technology as the primary full-text search engine for Oracle Universal Content Management (Oracle UCM). The OracleTextSearch component enables use of Oracle Text 11g as the primary full-text search engine for Oracle Universal Content Management (Oracle UCM). Oracle Text 11g offers state-of-the-art indexing capabilities and provides the underlying search capabilities for Oracle Secure Enterprise Search (Oracle SES). However, Oracle Text 11g has its own query syntax, which is intended more for use by applications or information professionals rather than casual end-users.

OracleTextSearch enables administrators to specify certain metadata fields to be optimized for the search index and to customize additional fields. This feature also enables a fast index rebuild and index optimization.

This section covers the following topics:

6.1.1 Considerations

The following items are important when considering use of OracleTextSearch:

  • Oracle Universal Content Management (Oracle UCM) version 11g Release 1 (11.1.1) supports all languages supported by Oracle Text 11g.

  • Oracle Text 11g runs on Oracle Database 11g. The Oracle UCM system database can be Oracle Database 11g, Microsoft SQL Server, or other databases as listed in the UCM 11g Release 1 (11.1.1) Certification Matrix. However, if the system database is not Oracle Database 11g, then an external provider for OracleTextSearch must be configured. See "Configuring OracleTextSearch for Content Server".

  • When using OracleTextSearch, Oracle Database version 11.1.0.7.0 or higher is required, and any SDATA field is limited to a maximum of 249 characters. All Optimized Fields are SDATA fields, which by default include dDocName, dDocTitle, dDocType, and dSecurityGroup. The total number of sdata fields is limited to thirty-two (32) fields. Note that without Folders_g enabled, the dDocTitle field is limited to 80 characters by default.

  • While Oracle UCM provides numerous search options using a variety of databases (Oracle, Microsoft SQL Server, IBM DB2), by default the database that serves as the search index is the same system database used by Oracle UCM to manage metadata and other configuration information (users, security groups, and so on.). The OracleTextSearch feature enables Oracle Text 11g as a separate search collection instance on Oracle Database 11g for Oracle UCM, which allows the search collection to reside on a separate computer and not compete with Oracle UCM for processors and memory. This can improve indexing and search response time.

  • The OracleTextSearch collection instance can be installed on a different platform than the Oracle UCM installation.

  • If OracleTextSearch is installed and running, and metadata fields are pushed into Content Server either by the administrator or by a component (requiring that Content Server be restarted), then the OracleTextSearch index must be rebuilt before content using the new metadata fields can be checked in to Content Server.

6.1.2 Configuring OracleTextSearch for Content Server

If the Oracle UCM system database used with OracleTextSearch is not Oracle Database 11g, then an external provider for OracleTextSearch must be configured.

  1. Open the config.cfg file for the Content Server instance in a text editor.

  2. Set the following property values:

    SearchIndexerEngineName=OracleTextSearch
    
    IndexerDatabaseProviderName=SystemDatabase
    
    AdditionalEscapeChars=-:#
    

    Note:

    You can specify a separate Oracle Database as the value of IndexerDatabaseProviderName, instead of SystemDatabase. However, before OracleTextSearch can function properly with the separate Oracle Database, you need to manually copy the ojdbc14.jar file from the ECM_ORACLE_HOME/ucm/idc/shared/classes folder to the UCM_DOMAIN/config/lib folder.

  3. Save the file.

  4. Restart Content Server.

  5. Rebuild the search index.

    For more information on rebuilding the index, see "Working with the Search Index". For more information on configuring Content Server and OracleTextSearch during installation, see Oracle Fusion Middleware Installation Guide for Oracle Enterprise Content Management Suite.

6.1.3 Benefits and Features of Using Oracle Text 11g

This section covers the following topics:

6.1.3.1 Indexing and Query Speeds and Techniques

Using Oracle Text 11g, Oracle UCM offers a significant increase in index speeds. Oracle Text indexing is transactional. Content Server sends a batch of document to Oracle Text, commits the batch, then starts the Oracle Text indexer. Content Server is notified of which documents failed to index and only those documents are resubmitted to be indexed. Content Server also supports the use of parallel indexing with the database, which can leverage multiple CPUs on the database server. This parallel indexing option can be enabled by the following Content Server configuration variable in the config.cfg file:

OracleTextIndexingParallelDegree=1

Search query response times are improved by increased indexing speeds and additional capabilities in Content Server to optimize the search collection. These capabilities include an automatic Fast Optimization for every 5,000 documents added to the Content Server instance, and a Full Optimization for every 50,000 documents or 20% growth of the repository.

Oracle UCM uses some of the newest Oracle Text 11g features. For example, Content Server automatically creates a new search index zone for each text information field in order to provide better search speed. Using information zones enables Content Server to query data as if it were full-text data. All text-based information fields (text, long text, and memo) are automatically added to as separate zones. In addition to the zones created for text information fields, Content Server provides an extra zone named IdcContent, which enables custom components, Inbound Refinery components, applications, or users to create XML content with tags that will be indexed as full-text metadata fields.

Oracle UCM uses the SDATA section feature in Oracle Text 11g to index important text, date, and integer fields and define them as Optimized Fields. The SDATA section is a separate XML structure managed by the Oracle Text engine that allows the engine to respond rapidly to requests involving data and integer ranges. Content Server can have up to 32 Optimized Fields, which includes data, integer, standard Content Server fields like dInDate, dOutDate, and fields selected to be optimized. All Optimized Fields are SDATA fields, which by default include dDocName, dDocTitle, dDocType, and dSecurityGroup.


Note:

If you want to change the set of Optimized Fields defined in Oracle Text 11g, the maximum allowed number of Optimized Fields is 32.

6.1.3.2 Fast Rebuild

OracleTextSearch provides a Indexer Rebuild Screen when you use the Collection Rebuild Cycle Screen on the Repository Manager: Indexer Tab. The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild. A Fast Rebuild is required in the following cases:

  • Adding or removing information fields

  • Changing any Optimized Field

  • Changing an information field to be an Optimized Field

A Fast Rebuild does not cause all the information (metadata and full-text) to be re-indexed. It adds the changes throughout the collection and updates it. Content Server search functionality is not affected during a Fast Rebuild cycle.

6.1.3.3 Query Syntax

Queries defined in Universal Query Syntax are supported and generally do not need any modification. This includes queries saved by users, queries defined in custom components, and queries defined in Site Studio pages.

6.1.3.4 Search Operators

Oracle Text supports the following defaults:

  • CONTAINS

  • MATCHES

  • Has Word Prefix

  • Range searches for dates and integers

The Oracle Text 11g engine supports additional search operators and functions which are not exposed in the user interface by default, but can be exposed through customization that adds to the operator definition HDA table. For details and examples of these operators see Oracle Text Reference.

6.1.3.4.1 Search Thesaurus

Certain queries, such as stem and Related Term, may be more effective if you use an Oracle Text thesaurus. Oracle Text enables you to create case-sensitive or case-insensitive thesauri which define synonym and hierarchical relationships between words and phrases. You can then search and retrieve documents that contains relevant text by expanding queries to include similar or related terms as defined in the thesaurus. For example, you can populate a thesaurus with specific product names, associated models, associated features, and so forth.

  • Default thesaurus: If you do not specify a thesaurus by name in a query, by default, the thesaurus operators use a thesaurus named DEFAULT. However, Oracle Text does not provide a DEFAULT thesaurus.

    As a result, if you want to use a default thesaurus for the thesaurus operators, you must create a thesaurus named DEFAULT. You can create the thesaurus through any of the thesaurus creation methods supported by Oracle Text:

    • CTX_THES.CREATE_THESAURUS (PL/SQL)

    • ctxload utility

  • Supplied thesaurus: Oracle Text does not provide a default thesaurus, but Oracle Text does supply a thesaurus, in the form of a file that you load with ctxload, that can be used to create a general-purpose, English-language thesaurus.

    The thesaurus load file can be used to create a default thesaurus for Oracle Text, or it can be used as the basis for creating thesauri tailored to a specific subject or range of subjects.


Note:

See the Oracle Text Reference to learn more about using ctxload and the CTX_THES package, and see the chapter, "Working With a Thesaurus in Oracle Text," in the Oracle Text Application Developer's Guide.

6.1.3.5 Case Sensitivity and Stemming Rules

Content Server automatically ensures that queries are executed as case-insensitive. By default, all full-text and text field search queries are case-insensitive. Content Server also handles case-insensitive search queries for information stored as Optimized Fields.

Content Server does not apply any stemming rules by default for Oracle Text 11g, but stemming rules can be applied by using the stem() function. Stemming rules may be used to have searches account for plurals, verbs, and so forth. Other methods for implementing stemming rules include modifying the standard query definition in the searchindexerrules configuration file, and by making configuration changes in the Oracle Text engine (Oracle Database).

Content Server handles content in non-English languages by using the WORLD_LEXER feature in the Oracle Text engine. This enables Oracle Text to automatically identify the language and apply the proper tokenization rules.

6.1.3.6 Search Results Data Clustering

With OracleTextSearch, Content Server retrieves additional information about a search result list and displays it in a new menu bar on the Search Results page. This information summarizes how many documents are attached to specific values in specific information fields. Content Server supports data clustering for up to four information fields (the default fields are Security Group and Document Type).

This can be useful if you have a query that returns many items. For example, a result set could include 200 content items, including 100 documents that belong to the Public security group, 75 that belong to the Sales group, and 25 that belong to the Marketing group. The menu option for Security Group will show you the list of values and how many documents belong to each value. You can select one of the values (Public, Sales, Marketing) from the menu and it will list only those documents in the result set that belong to that value.

6.1.3.7 Snippets

Content Server can retrieve document snippets as part of search results to show the occurrence of search terms in context of their usage. This feature is disabled by default. To enable this feature, although it can affect search query performance, set the following configuration entry in the config.cfg file:

OracleTextDisableSearchSnippit=false

6.1.3.8 Additional Changes

Additional changes because of the use of Oracle Text 11g include:

  • XML content is automatically indexed.

  • There are no visible changes in the Search user interface other than removal of Substring as a search operator option. The default search operators are CONTAINS, MATCHES, and HAS WORD PREFIX. Substring-based queries will still work.

  • Queries using the MATCHES operator on a non-optimized field will behave like a CONTAINS query. For example, if xDepartment is not optimized, then the query xDepartment MATCHES 'Marketing' will behave like xDepartment CONTAINS 'Marketing' and return hits on documents that have an xDepartment value of 'Marketing Services'' or 'Product Marketing'.

  • Relevancy ranking can be changed in Oracle Text 11g through use of an operator called DEFINESCORE. This operator can be added through a component to the WhereClause value of OracleTextSearch in the SearchQueryDefinition table (in the searchindexerrules configuration file). More information about this operator is available in the Oracle Text Reference document.

  • Complicated queries that previously could be placed into the full-text search box should now be placed in the advanced options on the Query Builder Form. The Query Builder Form is documented in the Oracle Fusion Middleware User's Guide for Content Server.

  • If you need to specify an escape character, use the configuration variable AdditionalEscapeChars=. The default setting is:

    AdditionalEscapeChars=_:#,-:#
    

    The default sets an underscore (_) and a hyphen (-) as escape characters.

  • The PDF Highlighting feature has been disabled.

  • The Spell Checking feature can be enabled, but it requires a custom component just as it did with Autonomy VDK.

6.1.4 Managing OracleTextSearch

This section covers the following topics:

6.1.4.1 Determining Fields to Optimize

Consider the following when determining the fields to optimize:

  • Do you want an exact match in a query?

  • Do you want that match to work faster in a search?

  • Do you want to sort search results by field?

By default the OracleTextSearch feature optimizes the Content ID and Document Title metadata fields.

A maximum number of 32 fields can be defined as Optimized Fields with the OracleTextSearch feature. Content Server can have up to 32 Optimized Fields, which includes data, integer, standard Content Server fields like dInDate, dOutDate, and fields selected to be optimized. All Optimized Fields are SDATA fields, which by default include dDocName, dDocTitle, dDocType, and dSecurityGroup.

The display of integer fields is dynamic and depends on the Content Server system configuration.

6.1.4.2 Assigning/Editing Optimized Fields

To select metadata Non-Optimized Fields and assign them to be Optimized Fields for search purposes, or to edit Optimized Fields and make them Non-Optimized, complete these steps:

  1. Log on to Content Server as system administrator.

  2. Click Administration in the navigation bar.

  3. Click Admin Applets.

  4. Click Configuration Manager, then the Information Fields tab, then Advanced Search Design.

    For more information on the Configuration Manager applet, see Oracle Fusion Middleware Application Administrator's Guide for Content Server.

  5. To make a metadata field Optimized, click Edit Fields. In the Advanced Options for "metadata_field" screen, select the Is Optimized check box.

  6. To edit an Optimized Field and make it Non-Optimized, click Edit Fields. In the Advanced Options for "metadata_field" screen, deselect the Is Optimized check box.

  7. When you have completed moving fields, use Index Fast Rebuild in Repository Manager to update the search collection to use the new and modified fields.


Note:

The Fast Rebuild does not function if a search collection rebuild is in progress.

6.1.4.3 Performing a Fast Rebuild

The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild. A Fast Rebuild is required in the following cases:

  • Adding or removing information fields

  • Changing any Optimized Field

  • Changing an information field to be an Optimized Field

To perform a Fast Rebuild, complete these steps:

  1. Log on to Content Server as system administrator.

  2. Click Administration in the navigation bar.

  3. Click Admin Applets, then Repository Manager, then the Indexer tab.

    The Repository Manager: Indexer Tab is displayed.

  4. On the Collection Rebuild Cycle Screen, click Start.

    The Indexer Rebuild Screen is displayed with a warning that rebuilding the search index is a time-consuming process. If you do not want to start a rebuild now, click Cancel; otherwise, continue with this procedure.

  5. On the Indexer Rebuild Screen, click OK.

    A Fast Rebuild of the search collection is performed.


Note:

A Fast Rebuild does not be performed if a rebuild of the search collection is in progress.

6.1.4.4 Modifying the Fields Displayed on Search Results

The OracleTextSearch feature provides default menu options on the Search Results page (set by the Oracle Database configuration script):

DrillDownFields=dDocType, dSecurityGroup

Administrators can add one more option from the list of Optimized Fields to further customize the search results. Edit the configuration to add the option to the list of DrillDownFields.


Note:

A Fast Rebuild must be performed after making any change in the DrillDownfields setting.

6.1.5 Searching with OracleTextSearch

Performing a search is generally the same except for the following:

  • There are no visible changes in the Search:Expanded Form page other than removal of Substring as a search operator option. The default search operator is CONTAINS. Substring-based queries still work.

  • Queries using the MATCHES operator on a non-optimized field behave like a CONTAINS query. For example, if xDepartment is not optimized, then the query xDepartment MATCHES 'Marketing' behaves like xDepartment CONTAINS 'Marketing' and returns hits on documents that have an xDepartment value of 'Marketing Services' or 'Product Marketing'.

6.1.6 Search Results with OracleTextSearch

When users run a search using the Search:Expanded Form, the Search Results page displays an additional menu bar with options that enable users to selectively view search results. The options represent categories used to filter the search results. The options can be context-sensitive, so if only one content item is returned for an option, then it shows only the one result in the menu itself, as shown in Figure 6-1. The default set of options include content type, security group, and account.


Note:

Two default menu options on the OracleTextSearch menu bar can be replaced by customized menu options: Security Group and Document Type.

If more than one content item is found for an option, an arrow is displayed next to the option name. When you move your cursor over the option name, a popup will display the list of the categories found in the search results for that option and the number of content items for each of the categories. You can click any category name in the popup to change the search results page to list only those items that match the category, as shown in Figure 6-2 where the Security Group lists the following categories and number of items found: Administration- (3), Marketing- (1), Public- (14), Secure- (5), Production- (1).

Figure 6-1 Search results with OracleTextSearch default menu

Description of Figure 6-1 follows
Description of "Figure 6-1 Search results with OracleTextSearch default menu"

Figure 6-2 Search results with snippets display and expanded OracleTextSearch menu

Description of Figure 6-2 follows
Description of "Figure 6-2 Search results with snippets display and expanded OracleTextSearch menu"

Element Description
Filter by Category Displays the categories used to filter the search results; for example, Content Type, Security Group, Account.
Content Type (Default) Lists the types and the number of each type of content items in the search results.

Clicking one of the content type names will change the search results list to show only those items that match the content type.

Security Group (Default) Lists the security groups and number of content items assigned to each group in the search results. Security groups include Administration, Public, and Secure.

Clicking one of the security group names will change the search results list to show only those items that match the security group.

Account (Default) Lists the account types and number of items assigned to each account in the search results.

Clicking one of the account types will change the search results list to show only those content items that match the account.