10 Managing Search Features

This chapter describes how to configure the OracleTextSearch feature to use Oracle Text 11g as the primary full-text search engine for Oracle WebCenter Content, how to configure Content Server to use Oracle Search Enterprise Search (SES), and how to configure full-text database searching.

This chapter covers the following topics:

Section 10.1, "Managing OracleTextSearch"
Section 10.2, "Managing Oracle Secure Enterprise Search"
Section 10.3, "Configuring Full-Text Database Search Index"

10.1 Managing OracleTextSearch

If you have a license to use the OracleTextSearch feature with Oracle Database 11g, then you can configure OracleTextSearch to use the Oracle Text 11g product as the primary full-text search engine for WebCenter Content. Oracle Text 11g offers state-of-the-art indexing capabilities and provides the underlying search capabilities for Oracle Secure Enterprise Search (Oracle SES). However, Oracle Text 11g has its own query syntax, which is intended more for use by applications or information professionals rather than casual end-users.

OracleTextSearch enables administrators to specify certain metadata fields to be optimized for the search index and to customize additional fields. This feature also enables a fast index rebuild and index optimization.

This section covers the following topics:

Section 10.1.1, "Considerations for Using OracleTextSearch"
Section 10.1.2, "Oracle Text 11g Features and Benefits"
Section 10.1.3, "Configuring OracleTextSearch for Content Server"
Section 10.1.4, "Managing OracleTextSearch"
Section 10.1.5, "Searching with OracleTextSearch"
Section 10.1.6, "Using Metadata Wildcards"
Section 10.1.7, "Using Internet-Style Search Syntax"
Section 10.1.8, "Customizing Search Results with OracleTextSearch"

10.1.1 Considerations for Using OracleTextSearch

The following items are important when considering use of the OracleTextSearch feature:

WebCenter Content version 11g Release 1 (11.1.1) supports all languages supported by Oracle Text 11g. OracleTextSearch can filter and extract content from different document formats in different languages. It supports a large number of document formats, including Microsoft Office file formats, Adobe PDF, HTML, and XML. It can render search results in various formats, including unformatted text, HTML with term highlighting, and original document format.
Oracle Text 11g runs on Oracle Database 11g. The Content Server database can be Oracle Database 11g, Microsoft SQL Server, or other databases as listed in the Oracle WebCenter Content 11g Release 1 (11.1.1) Certification Matrix. However, if the system database is not Oracle Database 11g, then an external provider for OracleTextSearch must be configured. For details on external providers, see Section 10.1.3.
When using OracleTextSearch, Oracle Database version 11.1.0.7.0 or higher is required.
Optimized fields for OracleTextSearch are created as SDATA fields, which have a maximum limit of 249 characters. This limit is imposed by Oracle Database and is reflected in Content Server by the OracleTextSearch component. Default SDATA fields include dDocName, dDocTitle, dDocType, and dSecurityGroup. The total number of SDATA fields is limited to 32 fields. Note that without the Folders_g component enabled, the dDocTitle field is limited to 80 characters by default.
While WebCenter Content provides numerous search options using a variety of databases (Oracle, Microsoft SQL Server, IBM DB2), by default the database that serves as the search index is the same system database used by WebCenter Content to manage metadata and other configuration information (users, security groups, and so on). The OracleTextSearch feature enables Oracle Text 11g as a separate search collection instance on Oracle Database 11g for WebCenter Content, which allows the search collection to reside on a separate computer and not compete with WebCenter Content for processors and memory. This can improve indexing and search response time.
The OracleTextSearch collection instance can be installed on a different platform than the WebCenter Content installation.
If the OracleTextSearch feature is configured and running, and metadata fields are pushed in to the Content Server instance either by the administrator or by a component (requiring that the Content Server instance be restarted), then the OracleTextSearch index must be rebuilt before content using the new metadata fields can be checked in to the Content Server instance.

10.1.2 Oracle Text 11g Features and Benefits

This section covers the following topics:

Section 10.1.2.1, "Indexing and Query Speeds and Techniques"
Section 10.1.2.2, "Fast Rebuild"
Section 10.1.2.3, "Query Syntax"
Section 10.1.2.4, "OracleTextSearch Operators"
Section 10.1.2.5, "Case Sensitivity and Stemming Rules"
Section 10.1.2.6, "Search Results Data Clustering"
Section 10.1.2.7, "Snippets"
Section 10.1.2.8, "Additional Changes"

10.1.2.1 Indexing and Query Speeds and Techniques

Using Oracle Text 11g, WebCenter Content offers a significant increase in index speeds. Oracle Text indexing is transactional. Content Server sends a batch of document to Oracle Text, commits the batch, then starts the Oracle Text indexer. Content Server is notified of which documents failed to index and only those documents are resubmitted to be indexed. Additional capabilities include an automatic Fast Optimization for every 5,000 documents added to the Content Server instance, and a Full Optimization for every 50,000 documents or 20% growth of the repository. Note that Content Server metadata-only search queries may degrade in performance when using Oracle Text.

WebCenter Content uses some of the newest Oracle Text 11g features. For example, Content Server automatically creates a new search index zone for each text information field in order to provide better search speed. Using information zones enables Content Server to query data as if it were full-text data. All text-based information fields (text, long text, and memo) are automatically added to as separate zones. In addition to the zones created for text information fields, Content Server provides an extra zone named IdcContent, which enables custom components, Oracle WebCenter Content: Inbound Refinery components, applications, or users to create XML content with tags that will be indexed as full-text metadata fields.

WebCenter Content uses the SDATA section feature in Oracle Text 11g to index important text, date, and integer fields and define them as Optimized Fields. The SDATA section is a separate XML structure managed by the Oracle Text engine that allows the engine to respond rapidly to requests involving data and integer ranges. Content Server can have up to 32 Optimized Fields, which includes data, integer, standard Content Server fields like dInDate, dOutDate, and fields selected to be optimized. All Optimized Fields are SDATA fields, which by default include dDocName, dDocTitle, dDocType, and dSecurityGroup.

Note:

If you want to change the set of Optimized Fields defined in Oracle Text 11g, the maximum allowed number of Optimized Fields is 32.

To avoid errors when indexing, do not add non-existent metadata fields to the Configuration Manager DrillDownFields parameter, and do not add memo fields to an SDATA section or to the DrillDownFields parameter. For information on the Configuration Manager, see Oracle Fusion Middleware Managing Oracle WebCenter Content.

10.1.2.2 Fast Rebuild

OracleTextSearch provides an Indexer Rebuild window when you use the Collection Rebuild Cycle window on the Repository Manager application Indexer tab. The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild. A Fast Rebuild is required in the following cases:

Adding or removing information fields
Changing any Optimized Field
Changing an information field to be an Optimized Field

A Fast Rebuild does not cause all the information (metadata and full-text) to be re-indexed. It adds the changes throughout the collection and updates it. Content Server search functionality is not affected during a Fast Rebuild cycle.

For information on performing a fast rebuild, see Section 10.1.4.3.

10.1.2.3 Query Syntax

Queries defined in Universal Query Syntax are supported and generally do not need any modification. This includes queries saved by users, queries defined in custom components, and queries defined in Site Studio pages.

10.1.2.4 OracleTextSearch Operators

Oracle Text supports the following defaults:

CONTAINS
MATCHES
Has Word Prefix
Range searches for dates and integers

The Oracle Text 11g engine supports additional search operators and functions which are not exposed in the user interface by default, but can be exposed through customization that adds to the operator definition HDA table. For details and examples of these operators see the Oracle Text Reference.

10.1.2.4.1 Search Thesaurus

Certain queries, such as stem and Related Term, may be more effective if you use an Oracle Text thesaurus. Oracle Text enables you to create case-sensitive or case-insensitive thesauri which define synonym and hierarchical relationships between words and phrases. You can then search and retrieve documents that contains relevant text by expanding queries to include similar or related terms as defined in the thesaurus. For example, you can populate a thesaurus with specific product names, associated models, associated features, and so forth.

Default thesaurus: If you do not specify a thesaurus by name in a query, by default, the thesaurus operators use a thesaurus named DEFAULT. However, Oracle Text does not provide a DEFAULT thesaurus.

As a result, if you want to use a default thesaurus for the thesaurus operators, you must create a thesaurus named DEFAULT. You can create the thesaurus through any of the thesaurus creation methods supported by Oracle Text:
- CTX_THES.CREATE_THESAURUS (PL/SQL)
- ctxload utility
Supplied thesaurus: Oracle Text does not provide a default thesaurus, but Oracle Text does supply a thesaurus, in the form of a file that you load with ctxload, that can be used to create a general-purpose, English-language thesaurus.

The thesaurus load file can be used to create a default thesaurus for Oracle Text, or it can be used as the basis for creating thesauri tailored to a specific subject or range of subjects.

Note:

See the Oracle Text Reference to learn more about using ctxload and the CTX_THES package, and see the chapter, "Working With a Thesaurus in Oracle Text," in the Oracle Text Application Developer's Guide.

10.1.2.5 Case Sensitivity and Stemming Rules

Content Server automatically ensures that queries are executed as case-insensitive. By default, all full-text and text field search queries are case-insensitive. Content Server also handles case-insensitive search queries for information stored as Optimized Fields.

Stemming is an Oracle Text feature that uses the stem ($) operator to search for terms that have the same linguistic root as the query term (the syntax is $term). For example, the input $sing would expand a search to include sang sung sing. Stemming rules can be used to have searches account for plurals, verbs, and so forth. Content Server does not apply any stemming rules by default for Oracle Text 11g, but a set of stemming rules can be created by using the stem ($) operator. Other methods for implementing stemming rules include modifying the standard query definition in the searchindexerrules configuration file (which requires a custom component), and by making configuration changes in the Oracle Text engine (Oracle Database).

Note:

For more information, see the chapter "Oracle Text CONTAINS Query Operators" in the Oracle Text Reference.

Content Server handles content in non-English languages by using the WORLD_LEXER feature in the Oracle Text engine. This enables Oracle Text to automatically identify the language and apply the proper tokenization rules.

10.1.2.6 Search Results Data Clustering

With the OracleTextSearch feature, Content Server retrieves additional information about a search result list and displays it in a new menu bar on the Search Results page. This information summarizes how many documents are attached to specific values in specific information fields. Content Server supports data clustering for up to four information fields (the default fields are Security Group and Document Type).

This can be useful if you have a query that returns many items. For example, a result set could include 200 content items, including 100 documents that belong to the Public security group, 75 that belong to the Sales group, and 25 that belong to the Marketing group. The menu option for Security Group will show you the list of values and how many documents belong to each value. You can select one of the values (Public, Sales, Marketing) from the menu and it will list only those documents in the result set that belong to that value.

10.1.2.7 Snippets

Content Server can retrieve document snippets as part of search results to show the occurrence of search terms in context of their usage. This feature is disabled by default. To enable this feature, although it can affect search query performance, set the following configuration entry in the config.cfg file:

OracleTextDisableSearchSnippet=false

10.1.2.8 Additional Changes

Additional changes because of the use of Oracle Text 11g include:

XML content is automatically indexed.
There are no visible changes in the Search user interface other than removal of Substring as a search operator option. The default search operators are CONTAINS, MATCHES, and HAS WORD PREFIX. Substring-based queries still work.
Queries using the MATCHES operator on a non-optimized field behave like a CONTAINS query. For example, if xDepartment is not optimized, then the query xDepartment MATCHES 'Marketing' behaves like xDepartment CONTAINS 'Marketing' and returns hits on content items that have an xDepartment value of 'Marketing Services' or 'Product Marketing'.
Relevancy ranking can be changed in Oracle Text 11g through use of an operator called DEFINESCORE. This operator can be added through a component to the WhereClause value of OracleTextSearch in the SearchQueryDefinition table (in the Oracle Text searchindexerrules configuration file). More information about this operator is available in the Oracle Text Reference document.
Complicated queries that previously could be placed into the full-text search box should now be placed in the advanced options on the Query Builder Form. The Query Builder Form is documented in the Oracle Fusion Middleware Using Oracle WebCenter Content.
If you need to specify an escape character, use the configuration variable AdditionalEscapeChars=. The default setting is:
```
AdditionalEscapeChars=_:#,-:#
```
The default sets an underscore (_) and a hyphen (-) as escape characters.
The PDF Highlighting feature has been disabled.
The Spell Checking feature can be enabled, but it requires a custom component just as it did with Autonomy VDK.

10.1.3 Configuring OracleTextSearch for Content Server

If you did not specify OracleTextSearch when first installing Content Server, to configure the feature:

Open the config.cfg file for the Content Server instance in a text editor. For example: MW_HOME/user_projects/domain/servers/ucm/config/config.cfg
Set the following property value:
```
SearchIndexerEngineName=OracleTextSearch
```
Note:

If you are using ACLs, and UseEntitySecurity=true is set with OracleTextSearch as the search engine, then the following must also be set in the config.cfg file for the Content Server instance:
```
ZonedSecurityFields=xClbraUserList,xClbraAliasList
```
If you are using an external data source instead of the system database, change the value SystemDatabase in the following property setting to the external database provider name:
```
IndexerDatabaseProviderName=SystemDatabase
```
Note:

You can specify a separate Oracle Database as the value of IndexerDatabaseProviderName, instead of SystemDatabase.

If the Content Server database used with OracleTextSearch is not Oracle Database 11g, then an external provider for OracleTextSearch must be configured. Obtain the driver and fmwgenerictoken.jar from MW_HOME/oracle_common/modules/oracle.jdbc_11.1.1/ojdbc6dms.jar.
Save the file.
Restart the Content Server instance. For instructions, see Section 3.2.3.
Rebuild the search index.

For more information on rebuilding the index, see Section 11.3. For more information on configuring Content Server and OracleTextSearch during installation, see Oracle Fusion Middleware Installing and Configuring Oracle WebCenter Content.

If you originally configured Content Server to use an external provider with OracleTextSearch, but later need to switch to use SystemDatabase, you must manually run the contentprocedures.sql script against your system database schema. The script file is located in the WC_CONTENT_ORACLE_HOME/ucm/idc/database/oracle/admin/ directory.

10.1.4 Managing OracleTextSearch

This section covers the following topics:

Section 10.1.4.1, "Determining Fields to Optimize"
Section 10.1.4.2, "Assigning/Editing Optimized Fields"
Section 10.1.4.3, "Performing a Fast Rebuild"
Section 10.1.4.4, "Modifying the Fields Displayed on Search Results"

10.1.4.1 Determining Fields to Optimize

Consider the following when determining the fields to optimize:

Do you want an exact match in a query?
Do you want that match to work faster in a search?
Do you want to sort search results by field?

By default the OracleTextSearch feature optimizes the Content ID and Document Title metadata fields.

A maximum number of 32 fields can be defined as Optimized Fields with the OracleTextSearch feature. The Content Server instance can have up to 32 Optimized Fields, which includes data, integer, standard Content Server fields like dInDate, dOutDate, and fields selected to be optimized. All Optimized Fields are SDATA fields, which by default include dDocName, dDocTitle, dDocType, and dSecurityGroup.

The display of integer fields is dynamic and depends on the Content Server configuration.

10.1.4.2 Assigning/Editing Optimized Fields

You can select metadata Non-Optimized Fields and assign them to be Optimized Fields for search purposes, or edit Optimized Fields and make them Non-Optimized.

To assign or edit Optimized fields:

Choose Administration, then Admin Applets.
Select Configuration Manager, then the Information Fields tab, then Advanced Search Design. For more information on the Configuration Manager applet, see Oracle Fusion Middleware Managing Oracle WebCenter Content.
To make a metadata field Optimized, click Edit Fields. In the Advanced Options for "metadata_field" window, select Is Optimized.
To edit an Optimized Field and make it Non-Optimized, click Edit Fields. In the Advanced Options for "metadata_field" window, deselect Is Optimized.
When you have completed moving fields, use Index Fast Rebuild in Repository Manager to update the search collection to use the new and modified fields.

Note:

The Fast Rebuild does not function if a search collection rebuild is in progress.

10.1.4.3 Performing a Fast Rebuild

The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild. A Fast Rebuild is required in the following cases:

Adding or removing information fields
Changing any Optimized Field
Changing an information field to be an Optimized Field

To perform a fast rebuild:

Choose Administration, then Admin Applets.
Choose Repository Manager, then select the Indexer tab.
In the Collection Rebuild Cycle part of the Repository Manager application Indexer tab, click Start.

The Indexer Rebuild window opens with a warning that rebuilding the search index is a time-consuming process. If you do not want to start a rebuild now, click Cancel; otherwise, continue with this procedure.
In the Indexer Rebuild window, click OK.

A Fast Rebuild of the search collection is performed.

Note:

A Fast Rebuild is not performed if a rebuild of the search collection is in progress.

Note:

The Fast Rebuild process does not create indexer counter values for Full Text, Meta Only, and Delete. To obtain indexer count statistics, you must perform a full collection rebuild.

10.1.4.4 Modifying the Fields Displayed on Search Results

The OracleTextSearch feature provides default menu options on the Search Results page (set by the Oracle Database configuration script):

DrillDownFields=dDocType, dSecurityGroup

Administrators can add one more option from the list of Optimized Fields to further customize the search results. Edit the configuration to add the option to the list of DrillDownFields. (This function does not support multi-value option lists.)

A Fast Rebuild must be performed after making any change in the DrillDownfields setting.

10.1.5 Searching with OracleTextSearch

Performing a search with OracleTextSearch is generally the same except there are no visible changes in the Search: Expanded Form other than removal of Substring as a search operator option. The default search operator is CONTAINS. Substring-based queries still work.

For details on performing searches, see Oracle Fusion Middleware Using Oracle WebCenter Content.

The following table describes the default search operators.

Operator Description Example

Operator	Description	Example
CONTAINS	Finds content items with the specified whole word or phrase in the metadata field. This is available only for OracleTextSearch, or for Oracle Database and Microsoft SQL Server database with the optional DBSearchContainsOpSupport component enabled.	When `form` is entered in the Title field, the search returns items with the word `form` in their title, but does not return items with the word `performance` or `reform`.
MATCHES	Finds items with the exact specified value in the metadata field.	When `address change form` is entered in the Title field, the search returns items with the exact title of `address change form`. A query that uses the MATCHES operator on a non-optimized field behaves the same as a query that uses the CONTAINS operator. For example, if the `xDepartment` field is not optimized, then the query `xDepartment MATCHES 'Marketing'` behaves like `xDepartment CONTAINS 'Marketing'`, returning hits on documents that have an `xDepartment` value of `'Marketing Services'` or `'Product Marketing'`.
HAS WORD PREFIX	Finds all content items with the specified word at the beginning of the metadata field. No wildcard character is placed before or after the specified value.	When `form` is entered in the Title field, the search returns all items with the word `form` at the beginning of their title, but does not return an item whose title begins with the word `performance` or `reform`.

CONTAINS

Finds content items with the specified whole word or phrase in the metadata field.

This is available only for OracleTextSearch, or for Oracle Database and Microsoft SQL Server database with the optional DBSearchContainsOpSupport component enabled.

When form is entered in the Title field, the search returns items with the word form in their title, but does not return items with the word performance or reform.

MATCHES

Finds items with the exact specified value in the metadata field.

When address change form is entered in the Title field, the search returns items with the exact title of address change form.

A query that uses the MATCHES operator on a non-optimized field behaves the same as a query that uses the CONTAINS operator.

For example, if the xDepartment field is not optimized, then the query xDepartment MATCHES 'Marketing' behaves like xDepartment CONTAINS 'Marketing', returning hits on documents that have an xDepartment value of 'Marketing Services' or 'Product Marketing'.

HAS WORD PREFIX

Finds all content items with the specified word at the beginning of the metadata field. No wildcard character is placed before or after the specified value.

When form is entered in the Title field, the search returns all items with the word form at the beginning of their title, but does not return an item whose title begins with the word performance or reform.

10.1.6 Using Metadata Wildcards

The following wildcards can be used in metadata search fields, even when using the Quick Search field.

An asterisk (*) indicates zero or many alphanumeric characters. For example:
- form* matches form and formula
- *orm matches form and reform
- *form* matches form, formula, reform, and performance
A question mark (?) indicates one alphanumeric character. For example:
- form? matches forms and form1, but not form or formal
- ??form matches reform but not perform

Note:

If you want to search for an asterisk (*) or a question mark (?) without treating it as wildcard, you need to put quotation marks around your search term; for example: "here*"

10.1.7 Using Internet-Style Search Syntax

Search techniques common to the popular Internet search engines are supported in Content Server. For example, entering new product in the Quick Search field will search for new <AND> product, while entering new, product will search for new <OR> product.

To enable this style of search, set the variable DoMetaInternetSearch=True. To disable this style of search, set the variable DoMetaInternetSearch=False. This is the default. For more information, see Oracle Fusion Middleware Configuration Reference for Oracle WebCenter Content.

The following table lists how Content Server interprets common characters.

Character	Interpreted As
Space ( )	AND
Comma (,)	OR
Minus (-)	NOT
Phrases enclosed in double-quotes ("`any phrase`")	Exact match of entered phrase

The following table lists examples of how Content Server interprets Internet-style syntax in a full-text search.

Query	Interpreted As
`new product`	new <AND> product
`(new, product) images`	(new <OR> product) <AND> images
`new product -images`	(new <AND> product) <AND> <NOT> images
`"new product", "new images"`	"new product" <OR> "new images"

The following table lists examples of how Content Server interprets Internet-style syntax when searching title metadata using the substring operator.

Query	Interpreted As
`new product`	dDocTitle <substring> 'new' <AND> dDocTitle <substring> 'product'
`new, product`	dDocTitle <substring> 'new' <OR> dDocTitle <substring> 'product'
`new -product`	dDocTitle <substring> 'new' <AND> <NOT> 'product'
`"new product"`	dDocTitle <substring> 'new product'

10.1.8 Customizing Search Results with OracleTextSearch

When users run a search using the Search: Expanded Form, the Search Results page displays an additional menu bar with options that enable users to selectively view search results. The options represent categories used to filter the search results. The options can be context-sensitive, so if only one content item is returned for an option, then it shows only the one result in the menu itself, as shown in Figure 10-1. The default set of options include Content Type, Security Group, and Account.

Note:

Two default menu options on the OracleTextSearch menu for Search Results can be replaced by customized menu options: Security Group and Document Type.

Figure 10-1 Search Results with OracleTextSearch Default Menu

Description of "Figure 10-1 Search Results with OracleTextSearch Default Menu"

If more than one content item is found for an option, an arrow is displayed next to the option name. When you move your cursor over the option name, a menu displays the list of the categories found in the search results for that option and the number of content items for each of the categories. You can click any category name on the menu to change the Search Results page to list only those items that match the category

Figure 10-2 shows a list of categories under Security Group and the number of items found in each category.

Figure 10-2 Search Results with Snippets Display and Expanded OracleTextSearch Menu

Description of "Figure 10-2 Search Results with Snippets Display and Expanded OracleTextSearch Menu"

Element	Description
Filter by Category	Displays the categories used to filter the search results, for example: Content Type, Security Group, Account.
Content Type	(Default) Lists the types and the number of each type of content items in the search results. Clicking one of the content type names changes the Search Results to show only those items that match the content type.
Security Group	(Default) Lists the security groups and number of content items assigned to each group in the search results. Security groups include: Administration, Public, and Secure. Clicking one of the security group names changes the Search Results to show only those items that match the security group.
Account	(Default) Lists the account types and number of items assigned to each account in the search results. Clicking one of the account types changes the Search Results to show only those content items that match the account.

10.2 Managing Oracle Secure Enterprise Search

Oracle Secure Enterprise Search (Oracle SES) 11g enables a secure, high quality, easy-to-use search across all enterprise information assets.If you have a license to use Oracle SES 11g, then you can configure WebCenter Content to use Oracle SES as follows:

The OracleTextSearch feature supports the use of Oracle SES as an external full-text search engine for WebCenter Content. For details on configuring Oracle SES, see Section 10.2.1.
The SESCrawlerExport component enables Oracle SES to search content in a Content Server instance without being the primary search engine. For details on configuring SESCrawlerExport, see Section 10.2.2.

For more information, see the "Cookbook: SES and UCM Setup" blog. For more information about Oracle SES, see Oracle Secure Enterprise Search Administrator's Guide.

10.2.1 Using Oracle SES as an External Full-Text Search Engine

WebCenter Content can be configured with the OracleTextSearch feature to use Oracle Secure Enterprise Search (Oracle SES) 11g as its back-end search engine. With this configuration, users can search multiple Content Server instances for a file.

Section 10.2.1.1, "Configuring Oracle SES for Use with OracleTextSearch"
Section 10.2.1.2, "Reconfiguring the Search Engine to Use Oracle SES with OracleTextSearch"

10.2.1.1 Configuring Oracle SES for Use with OracleTextSearch

To configure Oracle SES for use with the OracleTextSearch option:

Note:

If you are already using a search engine other than Oracle SES with WebCenter Content, such as the engine set up on the Content Server post-configuration page, and you want to change the search engine to Oracle SES, then you must create a new database provider and configure Oracle SES for using that provider. For more information, see Section 10.2.1.2.

After installing Oracle SES, edit the file ORACLE_HOME/network/admin/sqlnet.ora to comment out the following two lines:
```
tcp.invited_nodes
tcp.validate_checking
```
If Oracle SES is running, shut it down (mid-tier and database):
```
ORACLE_HOME/bin/searchctl stopall
```

Start the database:

ORACLE_HOME/bin/searchctl start_backend

Find database connection information for later use in the following file:
```
ORACLE_HOME/search/webapp/config/search.properties
```
Run the Repository Creation Utility (RCU) against Oracle SES and create the OCSEARCH schema. OCSEARCH sets only the search portion of a database already set up by RCU with Oracle SES.

To create this schema, select Content Server 11g - Search Only on the RCU Select Components window. For more information about running RCU, see Oracle Fusion Middleware Installing and Configuring Oracle WebCenter Content.
Perform a standard WebCenter Content installation and Content Server installation. For instructions, see Oracle Fusion Middleware Installing and Configuring Oracle WebCenter Content.

Caution:

Do not complete the steps on the Content Server post-configuration page, because the page sets up a regular database configuration.
Create a new Data Source (WLS DataSource) on the Oracle WebLogic Server instance to connect to Oracle SES.
1. In the Oracle WebLogic Server Administration Console, use the Services menu to choose JDBC, then Data Sources.
  
  A window listing the Summary of JDBC Data Sources opens.
2. Click New and enter values for the following items on the Create a New Data Source window:
  
  Name: Enter the new Data Source name.
  
  JNDI Name: Enter the new name again
  
  Database Type: Enter Oracle.
  
  Database Driver: Click *Oracle's Driver (Thin XA) for Instance Connection.
3. Click Next to see the Transaction Options.
4. Click Next to enter the database parameters. As mentioned in step 4, you can find database connection information in the search.properties file.
  
  Database Name: Enter the name of the database to connect to; for example, ses.
  
  Host Name: Enter the IP address of the database server.
  
  Port: Enter the database server port number for the database connection.
  
  Database User Name: Enter the database account user name. This is the SchemaOwner name you specified in the RCU creation process.
  
  Password: Enter the database account password to use to create database connections. This is the password you specified in the RCU creation process.
  
  Confirm Password: Enter the database account password again.
5. Click Next.
6. Click Test Configuration. Verify that the message "Connection test succeeded" appears at the top of the page, then click Next.
7. From the list of available target servers, select the target Content Server check box to deploy the new JDBC Data Source. For example, a target Content Server might be named UCM_server1.
8. Click Finish.
In the Content Server post-configuration page, click Select External in Full Text Search options, then enter the Data Source name.
Restart the Content Server instance. For instructions, see Section 3.2.3.

10.2.1.2 Reconfiguring the Search Engine to Use Oracle SES with OracleTextSearch

If you are already using a search engine other than Oracle SES with WebCenter Content (such as the engine set up on the Content Server post-configuration page), and you want to change the search engine to Oracle SES, then you must create a new database provider and configure Oracle SES for Content Server using that provider.

After installing Oracle SES, edit the file ORACLE_HOME/network/admin/sqlnet.ora to comment out the following two lines:
```
tcp.invited_nodes
tcp.validate_checking
```
If Oracle SES is running, shut it down (mid-tier and database):
```
ORACLE_HOME/bin/searchctl stopall
```

Start the database:

ORACLE_HOME/bin/searchctl start_backend

Find database connection information for later use in the following file:
```
ORACLE_HOME/search/webapp/config/search.properties
```
Run the Oracle Repository Creation Utility (RCU) against Oracle SES and create the OCSEARCH schema. OCSEARCH sets only the search portion of a database already set up by RCU with Oracle SES.

To create this schema, select Content Server 11g - Search Only on the RCU Select Components window.

For more information about running RCU, see "Creating Oracle WebCenter Content Schemas with the Repository Creation Utility" in Oracle Fusion Middleware Installing and Configuring Oracle WebCenter Content.
Create a new Data Source (WLS DataSource) on the Content Server instance to connect to Oracle SES.
1. In the Oracle WebLogic Server Administration Console, use the Services menu to choose JDBC, then Data Sources.
  
  A window listing the Summary of JDBC Data Sources opens.
2. Click New and enter values for the following items on the Create a New Data Source window:
  
  Name: Enter the new Data Source name: ExternalSearchProvider
  
  JNDI Name: Enter the new name again
  
  Database Type: Enter Oracle.
  
  Database Driver: Click *Oracle's Driver (Thin XA) for Instance Connection.
3. Click Next to see the Transaction Options.
4. Click Next and enter the database parameters. As mentioned in step 4, you can find database connection information in the search.properties file.
  
  Database Name: Enter the name of the database to connect to; for example, SES.
  
  Host Name: Enter the IP address of the database server.
  
  Port: Enter the database server port number for the database connection.
  
  Database User Name: Enter the database account user name. This is the SchemaOwner name you specified in the RCU creation process.
  
  Password: Enter the database account password to use to create database connections. This is the password you specified in the RCU creation process.
  
  Confirm Password: Enter the database account password again.
5. Click Next.
6. Click Test Configuration. Verify that the message "Connection test succeeded" appears at the top of the page, then click Next.
7. From a list of available target servers, select the target Content Server check box to deploy the new JDBC Data Source. For example, a target Content Server instance might be named UCM_server1.
8. Click Finish.
  
  Note:
  
  You do not have to restart the Oracle WebLogic Server instance.
Change the search (database) provider in Content Server:
1. Choose Administration, then Providers.
2. Click Add in the row to create a new database provider.
3. Enter or verify the new database provider settings:
  
  Provider Name: ExternalSearchProvider.
  
  Provider Description: External Database Provider
  
  Provider Class: intradoc.jdbc.JdbcWorkspace
  
  Connection Class: intradoc.jdbc.JdbcConnection
  
  Database Type: Select ORACLE.
  
  Use Data Source: Check this box.
  
  data source: Enter the name of your Data Source; for example, SES.
  
  Test Query: Enter a test query; for example, select * from SES.IDCTEXT
  
  Number of Connections: By default, this is set to 5.
  
  Extra Storage Keys: By default, this is set to system.
4. Click Add.
5. Restart the Content Server instance. For instructions, see Section 3.2.3.
  
  The new database provider name should be included in the list displayed on the Providers page.
Choose Administration, then Admin Server, then General Configuration.
In the Additional Configuration Variables section for General Configuration, enter or verify the following settings:

SearchIndexerEngineName=OracleTextSearch

IndexerDatabaseProviderName=ExternalSearchProvider
Restart the Content Server instance. For instructions, see Section 3.2.3.
Rebuild the search index using the Repository Manager applet.

For more information on the Repository Manager, see Oracle Fusion Middleware Managing Oracle WebCenter Content.

10.2.2 Using SESCrawlerExport for Oracle SES to Search Content Server Content

The SESCrawlerExport component adds functionality as a RSS feed generator to the Content Server instance and enables it to be searched by Oracle Secure Enterprise Search (Oracle SES). The component generates a snapshot of content currently on the Content Server instance and provides it to the Oracle SES Crawler.

The SESCrawlerExport component generates RSS feeds as XML files from its internal indexer, based on indexer activity. The component can access the original WebCenter Content content (for example, a Microsoft Word document), the web-viewable rendition, and all the metadata associated with each document. The component also has a template containing an Idoc script that applies the metadata values from the indexer to generate the XML document.

SESCrawlerExport generates RSS feeds for all documents for the initial crawl, as well as feeds for updated and deleted documents for the incremental crawl. Each document can be an item in the feed, together with the operation on the item (for example: insert, delete, update), its metadata (for example: author, summary), URL links, and so on. The indexer wakes up periodically (around 30 seconds) and creates a data feed for the documents that were changed.

The Content Server connector for Oracle SES reads the feeds provided by SESCrawlerExport according to the crawling schedule. Oracle SES parses, extracts the metadata information, and fetches the document content using its generic RSS crawler framework.

The SESCrawlerExport component is not affected by what search engine is used in the Content Server instance. SESCrawlerExport does not affect how Oracle SES performs searches.

Note:

The YahooUserInterfaceLibrary component must be enabled on the Content Server instance. This component has JavaScript libraries that SESCrawlerExport users during the initial crawl to report the status of the feed generation.

Note:

By default, SESCrawlerExport does not support snapshots of DigitalMedia document types, and such a document will not be found with an SES search. The sceCoreFilter configuration parameter in the SESExportCrawler administration page acts as a pre-filter to the source location script and filters out any DigitalMedia content before it is sent to the sceSourceLocation script. The default parameter setting for sceCoreFilter is:

<$if dDocType and dDocType like 'DigitalMedia'$>#none<$else$>#customScript#<$endif$>

To allow DigitalMedia document types by having the core filtering defer to sceSourceLocationScript, change the default sceCoreFilter configuration parameter to #customScript#

This section covers the following topics:

Section 10.2.2.1, "Accessing the SESCrawlerExport Component"
Section 10.2.2.2, "Taking a Snapshot of Content"
Section 10.2.2.3, "Configuring SESCrawlerExport Parameters"
Section 10.2.2.4, "Configuring the Content Server Source Location Script"

10.2.2.1 Accessing the SESCrawlerExport Component

To access the SESCrawlerExport component:

Choose Administration, then Admin Server, then Component Manager.
In the Component Manager page, from the list of Integration components select SESCrawlerExport.
Click Update.

The SESCrawlerExport component is enabled.
Choose Administration, then SESCrawlerExport to open the SESCrawlerExport Administration page. Use this page to take a snapshot of content to generate RSS feeds and to access the Configure SESCrawlerExport page.

10.2.2.2 Taking a Snapshot of Content

Taking a snapshot of content on the Content Server instance generates feeds to be provided to Oracle SES Crawler. The snapshot generates a configFile.xml at the location specified by the SESCrawlerExport component FeedLoc parameter. XML feeds are created in the subdirectory with the source name; for example, wikis. Performing a snapshot can take some time depending on the number of items you have stored on the Content Server instance and how many sources you are generating.

To take a snapshot:

Choose Administration, then SESCrawlerExport.
In the SES Crawler Export Administration page, select the source or sources you want to capture in the snapshot from the available menu options.

If you select All Sources from the list of content sources, SESCrawlerExport generates RSS feeds for all defined sources. You can also choose to select individual sources or select a subset of sources to take a snapshot of just those sources. Any update on the configFile.xml document that causes reindexing to occur also generates the feeds in the same location.
Click Take Snapshot.

Note:

The configFile.xml file is generated once for the same configuration, either on the initial snapshot or on the first update of any document, whichever occurs first.

10.2.2.3 Configuring SESCrawlerExport Parameters

The SESCrawlerExport component has several parameters you can configure to specify the data feed source, content, metadata, the number of items per data feed, and so forth. Changes to parameters take effect immediately; however, you may need to retake a new snapshot to propagate the changes.

To configure these parameters:

Choose Administration, then SESCrawlerExport.
In the SES Crawler Export Administration page, click Configure SESCrawlerExport.
Specify or confirm values for the following SESCrawlerExport parameter fields.

Element Description

Element	Description
Hostname (sceHostname)	The string for the hostname of the Content Server instance that hosts the content to be exported. If the value is blank, the hostname is set to the host that performs the Oracle SES export. This field is Idoc capable.
Feed Location (sceFeedLoc)	Directory to which the configuration file and data feeds are written. The configFile.xml file is generated at this location. Data feeds and content are generated in the subdirectory with the Source Name from this location.
Metadata List (sceMetadataList)	A comma-delineated list of metadata values that are exported to Oracle SES. If the value is blank, the list of metadata values consists of the following fields: dID,dDocName,dRevLabel,dDocType,dDocAccount,dSecurityGroup,dOriginalName,dReleaseDate,dOutDate and all custom metadata fields (those beginning with the letter "x"). If this field is filled with a set of metadata fields, only those fields are exported to Oracle SES. These fields can be standard or custom metadata fields.
Admin Email(s) (sceAdminEmail)	A comma-delineated list of email addresses, user names, and user aliases that are notified by email when crawling errors occur.
Custom Metadata Blacklist (sceCustomMetadataBlacklist)	A comma-delineated list of metadata values that are not exported to Oracle SES. These fields can be standard or custom metadata fields.
Maximum Feeds Pending Consumption by SES per Source (sceMaxFeedsPerSource)	A number that limits the creation of new datafeeds if the datafeeds for each source that are pending consumption by SES exceeds the specified value. To limit the feeds, this number must be set to 0 or a positive value. If this number is set to a negative value, there is no limit on the feeds generated.
Maximum Items Per Datafeed (sceMaxItemsPerFeed)	The maximum number of content items for each data feed. (A content item in the feed is an operation. For example: insert, update, or delete a document.)
Core Filter (sceCoreFilter)	Performs some pre-filtering on content to remove them from being exported to Oracle SES. Oracle recommends that you leave this value at the default setting.
Crawler Role (sceCrawlerRole)	The Content Server role required for the account that Oracle SES uses to crawl the Content Server instance. By default, the Content Server `admin` role is required. Caution: Do not use the default Oracle WebLogic Server administrator account to crawl from Oracle SES. Instead use either an administrator account from an external source (such as an LDAP provider) or the local Content Server account. If necessary, you can change the required role `admin` to another role, using this SESCrawlerExport field. For example: On the Content Server instance, create a new role called `scecrawlerrole`. Create a new local user account called `sescrawler` and assign the role `scecrawlerrole` to this user account. On Oracle SES, change your source definition to use the `sescrawler` account to crawl the Content Server instance. On the Content Server instance, add `sceCrawlerRole=sescrawlerrole` in the `config.cfg` file.
Source Name(s) (sceSourceName)	A comma-delineated list of all content sources created on the Webcenter Content Serve instance. Each listed source is completely identical (mirrored). By having multiple sources, the content on this instance can be independently consumed by multiple Oracle SES servers. These source names are used as the subdirectory names for the Feed Location directory to hold data feeds and contents. Note: The name "ssSource" is a reserved source name and must not be used in this field.
Disable Secure APIs (sceDisableSecureAPIs)	A Boolean flag that determines if the security for the services provided by the SESCrawlerExport component are done internally (`false`) or by the Content Server (`true`) natively. For more information on Single Sign-On, see "Section 10.2.2.3.2.

Hostname

(sceHostname)

The string for the hostname of the Content Server instance that hosts the content to be exported. If the value is blank, the hostname is set to the host that performs the Oracle SES export. This field is Idoc capable.

Feed Location

(sceFeedLoc)

Directory to which the configuration file and data feeds are written. The configFile.xml file is generated at this location. Data feeds and content are generated in the subdirectory with the Source Name from this location.

Metadata List

(sceMetadataList)

A comma-delineated list of metadata values that are exported to Oracle SES. If the value is blank, the list of metadata values consists of the following fields: dID,dDocName,dRevLabel,dDocType,dDocAccount,dSecurityGroup,dOriginalName,dReleaseDate,dOutDate and all custom metadata fields (those beginning with the letter "x").

If this field is filled with a set of metadata fields, only those fields are exported to Oracle SES. These fields can be standard or custom metadata fields.

Admin Email(s)

(sceAdminEmail)

A comma-delineated list of email addresses, user names, and user aliases that are notified by email when crawling errors occur.

Custom Metadata Blacklist

(sceCustomMetadataBlacklist)

A comma-delineated list of metadata values that are not exported to Oracle SES. These fields can be standard or custom metadata fields.

Maximum Feeds Pending Consumption by SES per Source (sceMaxFeedsPerSource)

A number that limits the creation of new datafeeds if the datafeeds for each source that are pending consumption by SES exceeds the specified value.

To limit the feeds, this number must be set to 0 or a positive value. If this number is set to a negative value, there is no limit on the feeds generated.

Maximum Items Per Datafeed

(sceMaxItemsPerFeed)

The maximum number of content items for each data feed. (A content item in the feed is an operation. For example: insert, update, or delete a document.)

Core Filter

(sceCoreFilter)

Performs some pre-filtering on content to remove them from being exported to Oracle SES. Oracle recommends that you leave this value at the default setting.

Crawler Role

(sceCrawlerRole)

The Content Server role required for the account that Oracle SES uses to crawl the Content Server instance. By default, the Content Server admin role is required.

Caution: Do not use the default Oracle WebLogic Server administrator account to crawl from Oracle SES. Instead use either an administrator account from an external source (such as an LDAP provider) or the local Content Server account. If necessary, you can change the required role admin to another role, using this SESCrawlerExport field. For example:

On the Content Server instance, create a new role called scecrawlerrole.
Create a new local user account called sescrawler and assign the role scecrawlerrole to this user account.
On Oracle SES, change your source definition to use the sescrawler account to crawl the Content Server instance.
On the Content Server instance, add sceCrawlerRole=sescrawlerrole in the config.cfg file.

Source Name(s)

(sceSourceName)

A comma-delineated list of all content sources created on the Webcenter Content Serve instance. Each listed source is completely identical (mirrored). By having multiple sources, the content on this instance can be independently consumed by multiple Oracle SES servers.

These source names are used as the subdirectory names for the Feed Location directory to hold data feeds and contents.

Note: The name "ssSource" is a reserved source name and must not be used in this field.

Disable Secure APIs

(sceDisableSecureAPIs)

A Boolean flag that determines if the security for the services provided by the SESCrawlerExport component are done internally (false) or by the Content Server (true) natively. For more information on Single Sign-On, see "Section 10.2.2.3.2.

10.2.2.3.1 Configuring Content Server Source in Oracle SES

The Content Server connector enables Oracle SES to search the Content Server instance in WebCenter Content. The connector reads the feeds provided by the Content Server instance according to a crawling schedule. To crawl data from Oracle SES, you must create a source of type Content Server. For instructions on installing the connector patch and creating the Content Server source, see the Oracle Secure Enterprise Search Administrator's Guide.

The following parameters are used in setting up the Content Server source:

Configuration URL:
```
http://host_name/instance_name/idcplg?IdcService=SES_CRAWLER_DOWNLOAD_CONFIG&source=source_name
```
The parameter represented by source_name must be equal to one of the strings used in SESCrawlerExport component Source Name (sceSourceName) parameter. This parameter points to one of the content sources on the Content Server instance. For example:
```
http://stahz16/ucm/idcplg?IdcService=SES_CRAWLER_DOWNLOAD_CONFIG&source=cs
```
HTTP endpoint for authentication and authorization: You are prompted for the HTTP endpoint values during the Oracle WebCenter Content identity plug-in activation and authorization manager configuration. The two values are usually the same on the same Content Server instance and are usually in the form of http://host_name/instance_name/idcplg. For example, http://host.example.com/ucm/idcplg. This value is used as the endpoint for any service call to Content Server instance. You can also find the value by choosing Administration, then Admin Server, then Internet Configuration. Use the current URL (without URL parameter) as the HTTP endpoint.

10.2.2.3.2 Configuring Content Server Source with Oracle Single Sign-On

When the Content Server instance is secured with Oracle Single Sign-On (OSSO), the SESCrawlerExport component configuration must be changed to allow Oracle SES access to the services provided by SESCrawlerExport. Go to the Configure SESCrawlerExport page to disable the internal security mechanisms by setting the Disable Secure APIs parameter to true.

10.2.2.3.3 Configuring Content Server Source with Other Single Sign-On

When the Content Server instance is secured with a single sign-on solution other than Oracle Single Sign-On (OSSO), some changes must be made to allow Oracle SES access to the services provided by the SESCrawlerExport component.

Configuration: When using a single sign-on solution other than Oracle Single Sign-On, the security for the services provided by the SESCrawlerExport component are provided by the component itself. Go to the Configure SESCrawlerExport page to enable the internal SESCrawlerExport security mechanisms by setting the Disable Secure APIs parameter to false.
Web Server: Access to the services provided by the SESCrawlerExport component must bypass single sign-on because Oracle SES is not compatible with the single sign-on solutions. Depending on the selected single sign-on solution, creating a bypass might be as simple as configuring a web server module to allow access to a subset of services.

If you set up an additional web server on the Content Server instance, the web server must run on a different port than the standard Content Server port (that is, something other than port 80). Configure this additional web server to not have any single sign-on protection at all. Also, set up Access Control Lists to allow only Oracle SES access to this web server. In the Oracle SES configuration, use this additional web server port in the configuration URLs for the Content Server source.

10.2.2.4 Configuring the Content Server Source Location Script

The Content Server source location script is a fully customizable Idoc script that evaluates against a content item's metadata and returns the source(s) to which this content item should be set.

To access the page where you can create or update the source location script:

Choose Administration, then SESCrawlerExport.
In the SES Crawler Export Administration page, click Configure SESCrawlerExport.
In the Configure SESCrawlerExport page, click Configure Source Location Script.
Enter the Idoc Script in the provided area.

By default, the source location script is set to #all, which sends every content item flagged as Latest Released to all sources (see the Source Name parameter) configured on the Content Server instance. The #all source name is a reserved keyword that indicates that all sources receive the content item.

Similarly, the #none source name is also a reserved keyword, but it indicates that the content item should be sent to no sources (basically, the content item is not exported to Oracle SES).
Click Update.

If you want to remove the source location script, click Reset.
To test the source location script, enter a content item's Document Name (dDocName) in the field provided, then click Test.

If there are syntax errors in the script, the errors are either displayed on the page or in the server output, depending on the type of syntax error. Logic errors can be corrected on the SESCrawlerExport Source Location Script page and the test can be run again immediately.

If the script returns a source name that does not exist, an error is generated in the server output. The invalid source name is removed and the item(s) continue to be processed, but it is recorded in the logs. You can correct this problem either by removing the source name from the script or by adding a new Source Name parameter value for your Content Server instance.

You can return multiple source names in the script by separating them with commas.

Example

In the following example, the source location script is set up to send all content items that have a Document Type (dDocType) of ADACCT into a source named accounting, and everything else falls into the source named default. The accounting and default sources must be set up separately by adding those names into the Source Name parameter on the Configure SESCrawlerExport page.

<$if dDocType like "ADACCT" $>
accounting
<$else$>
default
<$endif$>

10.3 Configuring Full-Text Database Search Index

To set up and use full-text database searching and indexing for SQL Server and other databases:

Install WebCenter Content with the Content Server instance and configure it to work with the database.
Add the following entry to the DomainHomeName\ucm\cs\config\config.cfg file and save the file:
```
SearchIndexerEngineName=DATABASE.FULLTEXT
```
Restart the Content Server instance. For instructions, see Section 3.2.3.
Rebuild the search index using the Repository Manager.

For information about the Repository Manager, see Oracle Fusion Middleware Managing Oracle WebCenter Content.

Note:

If you have difficulty rebuilding the full-text database search index after importing the OCS schema, the message Unable to create Oracle text collection 'IdcText1' might be displayed. If this occurs, the solution is to log in as (Content Server) Database administrator and drop the tables IdcText1 and IdcText2.

For more information, see "Backup and Recovery Recommendations for Oracle WebCenter Content" in Oracle Fusion Middleware Administrator's Guide.