Skip Headers
Oracle® WebCenter Content System Administrator's Guide for Content Server
11g Release 1 (11.1.1)

Part Number E10792-04
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

7 Managing Search Tools

This chapter describes concepts and tasks for managing Oracle WebCenter Content Server (Content Server) search tools:

7.1 OracleTextSearch

If you have a license to use the OracleTextSearch feature (with Oracle Database 11g), then you can configure OracleTextSearch to use Oracle Text 11g as the primary full-text search engine for Oracle WebCenter Content (WebCenter Content). Oracle Text 11g offers state-of-the-art indexing capabilities and provides the underlying search capabilities for Oracle Secure Enterprise Search (Oracle SES). However, Oracle Text 11g has its own query syntax, which is intended more for use by applications or information professionals rather than casual end-users.

OracleTextSearch enables administrators to specify certain metadata fields to be optimized for the search index and to customize additional fields. This feature also enables a fast index rebuild and index optimization.

This section covers the following topics:

7.1.1 Considerations for Using OracleTextSearch

The following items are important when considering use of the OracleTextSearch feature:

  • WebCenter Content version 11g Release 1 (11.1.1) supports all languages supported by Oracle Text 11g. OracleTextSearch can filter and extract content from different document formats in different languages. It supports a large number of document formats, including Microsoft Office file formats, Adobe PDF, HTML, and XML. It can render search results in various formats, including unformated text, HTML with term highlighting, and original document format.

  • Oracle Text 11g runs on Oracle Database 11g. The Content Server system database can be Oracle Database 11g, Microsoft SQL Server, or other databases as listed in the UCM 11g Release 1 (11.1.1) Certification Matrix. However, if the system database is not Oracle Database 11g, then an external provider for OracleTextSearch must be configured. See Section 7.1.2, "Configuring OracleTextSearch for Content Server."

  • When using OracleTextSearch, Oracle Database version 11.1.0.7.0 or higher is required, and any SDATA field is limited to a maximum of 249 characters. All Optimized Fields are SDATA fields, which by default include dDocName, dDocTitle, dDocType, and dSecurityGroup. The total number of SDATA fields is limited to thirty-two (32) fields. Note that without the Folders_g component enabled, the dDocTitle field is limited to 80 characters by default.

  • While WebCenter Content provides numerous search options using a variety of databases (Oracle, Microsoft SQL Server, IBM DB2), by default the database that serves as the search index is the same system database used by WebCenter Content to manage metadata and other configuration information (users, security groups, and so on). The OracleTextSearch feature enables Oracle Text 11g as a separate search collection instance on Oracle Database 11g for WebCenter Content, which allows the search collection to reside on a separate computer and not compete with WebCenter Content for processors and memory. This can improve indexing and search response time.

  • The OracleTextSearch collection instance can be installed on a different platform than the WebCenter Content installation.

  • If the OracleTextSearch feature is configured and running, and metadata fields are pushed into the Content Server instance either by the administrator or by a component (requiring that the Content Server instance be restarted), then the OracleTextSearch index must be rebuilt before content using the new metadata fields can be checked in to the Content Server instance.

7.1.2 Configuring OracleTextSearch for Content Server

If you did not specify OracleTextSearch when first installing Content Server, use this procedure to configure the feature:

  1. Open the config.cfg file for the Content Server instance in a text editor. For example: MW_HOME/user_projects/domain/servers/ucm/config/config.cfg

  2. Set the following property value:

    SearchIndexerEngineName=OracleTextSearch
    

    Note:

    If you are using ACLs, and UseEntitySecurity=true is set with OracleTextSearch as the search engine, then the following must also be set in the config.cfg file for the Content Server instance:

    ZonedSecurityFields=xClbraUserList,xClbraAliasList
    
  3. If you are using an external data source instead of the system database, change the value SystemDatabase in the following property setting to the external database provider name:

    IndexerDatabaseProviderName=SystemDatabase
    

    Note:

    You can specify a separate Oracle Database as the value of IndexerDatabaseProviderName, instead of SystemDatabase.

    If the Content Server system database used with OracleTextSearch is not Oracle Database 11g, then an external provider for OracleTextSearch must be configured. The driver jar ojdbc6.jar is provided by Oracle in the MW_HOME/wlserver_10.3/server/lib directory.

  4. Save the file.

  5. Restart the Content Server instance.

  6. Rebuild the search index.

    For more information on rebuilding the index, see Section 4.2.3, "Working with the Search Index." For more information on configuring Content Server and OracleTextSearch during installation, see Oracle WebCenter Content Installation Guide.

If you originally configured Content Server to use an external provider with OracleTextSearch, but later need to switch to use SystemDatabase, you must manually run the contentprocedures.sql script against your system database schema. The script file is located in the WC_HOME/ucm/idc/database/oracle/admin/ directory.

7.1.3 Oracle Text 11g Features and Benefits

This section covers the following topics:

7.1.3.1 Indexing and Query Speeds and Techniques

Using Oracle Text 11g, WebCenter Content offers a significant increase in index speeds. Oracle Text indexing is transactional. The Content Server system sends a batch of document to Oracle Text, commits the batch, then starts the Oracle Text indexer. The Content Server system is notified of which documents failed to index and only those documents are resubmitted to be indexed. Content Server software also supports the use of parallel indexing with the database, which can leverage multiple CPUs on the database server.

Search query response times are improved by increased indexing speeds and additional capabilities in the Content Server system to optimize the search collection. These capabilities include an automatic Fast Optimization for every 5,000 documents added to the Content Server instance, and a Full Optimization for every 50,000 documents or 20% growth of the repository.

WebCenter Content uses some of the newest Oracle Text 11g features. For example, the Content Server system automatically creates a new search index zone for each text information field in order to provide better search speed. Using information zones enables the Content Server system to query data as if it were full-text data. All text-based information fields (text, long text, and memo) are automatically added to as separate zones. In addition to the zones created for text information fields, the Content Server system provides an extra zone named IdcContent, which enables custom components, Oracle WebCenter Content: Inbound Refinery components, applications, or users to create XML content with tags that will be indexed as full-text metadata fields.

WebCenter Content uses the SDATA section feature in Oracle Text 11g to index important text, date, and integer fields and define them as Optimized Fields. The SDATA section is a separate XML structure managed by the Oracle Text engine that allows the engine to respond rapidly to requests involving data and integer ranges. The Content Server system can have up to 32 Optimized Fields, which includes data, integer, standard Content Server fields like dInDate, dOutDate, and fields selected to be optimized. All Optimized Fields are SDATA fields, which by default include dDocName, dDocTitle, dDocType, and dSecurityGroup.

Note:

If you want to change the set of Optimized Fields defined in Oracle Text 11g, the maximum allowed number of Optimized Fields is 32.

To avoid errors when indexing, do not add non-existent metadata fields to the Configuration Manager DrillDownFields parameter, and do not add memo fields to an SDATA section or to the DrillDownFields parameter. For information on the Configuration Manager, see Oracle WebCenter Content Application Administrator's Guide for Content Server.

7.1.3.2 Fast Rebuild

OracleTextSearch provides an Indexer Rebuild Screen when you use the Collection Rebuild Cycle Screen on the Repository Manager: Indexer Tab. The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild. A Fast Rebuild is required in the following cases:

  • Adding or removing information fields

  • Changing any Optimized Field

  • Changing an information field to be an Optimized Field

A Fast Rebuild does not cause all the information (metadata and full-text) to be re-indexed. It adds the changes throughout the collection and updates it. Content Server search functionality is not affected during a Fast Rebuild cycle.

7.1.3.3 Query Syntax

Queries defined in Universal Query Syntax are supported and generally do not need any modification. This includes queries saved by users, queries defined in custom components, and queries defined in Site Studio pages.

7.1.3.4 OracleTextSearch Operators

Oracle Text supports the following defaults:

  • CONTAINS

  • MATCHES

  • Has Word Prefix

  • Range searches for dates and integers

The Oracle Text 11g engine supports additional search operators and functions which are not exposed in the user interface by default, but can be exposed through customization that adds to the operator definition HDA table. For details and examples of these operators see the Oracle Text Reference.

7.1.3.4.1 Search Thesaurus

Certain queries, such as stem and Related Term, may be more effective if you use an Oracle Text thesaurus. Oracle Text enables you to create case-sensitive or case-insensitive thesauri which define synonym and hierarchical relationships between words and phrases. You can then search and retrieve documents that contains relevant text by expanding queries to include similar or related terms as defined in the thesaurus. For example, you can populate a thesaurus with specific product names, associated models, associated features, and so forth.

  • Default thesaurus: If you do not specify a thesaurus by name in a query, by default, the thesaurus operators use a thesaurus named DEFAULT. However, Oracle Text does not provide a DEFAULT thesaurus.

    As a result, if you want to use a default thesaurus for the thesaurus operators, you must create a thesaurus named DEFAULT. You can create the thesaurus through any of the thesaurus creation methods supported by Oracle Text:

    • CTX_THES.CREATE_THESAURUS (PL/SQL)

    • ctxload utility

  • Supplied thesaurus: Oracle Text does not provide a default thesaurus, but Oracle Text does supply a thesaurus, in the form of a file that you load with ctxload, that can be used to create a general-purpose, English-language thesaurus.

    The thesaurus load file can be used to create a default thesaurus for Oracle Text, or it can be used as the basis for creating thesauri tailored to a specific subject or range of subjects.

Note:

See the Oracle Text Reference to learn more about using ctxload and the CTX_THES package, and see the chapter, "Working With a Thesaurus in Oracle Text," in the Oracle Text Application Developer's Guide.

7.1.3.5 Case Sensitivity and Stemming Rules

The Content Server system automatically ensures that queries are executed as case-insensitive. By default, all full-text and text field search queries are case-insensitive. Content Server also handles case-insensitive search queries for information stored as Optimized Fields.

Content Server does not apply any stemming rules by default for Oracle Text 11g, but stemming rules can be applied by using the stem() function. Stemming rules may be used to have searches account for plurals, verbs, and so forth. Other methods for implementing stemming rules include modifying the standard query definition in the searchindexerrules configuration file, and by making configuration changes in the Oracle Text engine (Oracle Database).

Content Server handles content in non-English languages by using the WORLD_LEXER feature in the Oracle Text engine. This enables Oracle Text to automatically identify the language and apply the proper tokenization rules.

7.1.3.6 Search Results Data Clustering

With the OracleTextSearch feature, the Content Server system retrieves additional information about a search result list and displays it in a new menu bar on the Search Results page. This information summarizes how many documents are attached to specific values in specific information fields. Content Server supports data clustering for up to four information fields (the default fields are Security Group and Document Type).

This can be useful if you have a query that returns many items. For example, a result set could include 200 content items, including 100 documents that belong to the Public security group, 75 that belong to the Sales group, and 25 that belong to the Marketing group. The menu option for Security Group will show you the list of values and how many documents belong to each value. You can select one of the values (Public, Sales, Marketing) from the menu and it will list only those documents in the result set that belong to that value.

7.1.3.7 Snippets

Content Server can retrieve document snippets as part of search results to show the occurrence of search terms in context of their usage. This feature is disabled by default. To enable this feature, although it can affect search query performance, set the following configuration entry in the config.cfg file:

OracleTextDisableSearchSnippet=false

7.1.3.8 Additional Changes

Additional changes because of the use of Oracle Text 11g include:

  • XML content is automatically indexed.

  • There are no visible changes in the Search user interface other than removal of Substring as a search operator option. The default search operators are CONTAINS, MATCHES, and HAS WORD PREFIX. Substring-based queries still work.

  • Queries using the MATCHES operator on a non-optimized field behave like a CONTAINS query. For example, if xDepartment is not optimized, then the query xDepartment MATCHES 'Marketing' behaves like xDepartment CONTAINS 'Marketing' and returns hits on content items that have an xDepartment value of 'Marketing Services' or 'Product Marketing'.

  • Relevancy ranking can be changed in Oracle Text 11g through use of an operator called DEFINESCORE. This operator can be added through a component to the WhereClause value of OracleTextSearch in the SearchQueryDefinition table (in the searchindexerrules configuration file). More information about this operator is available in the Oracle Text Reference document.

  • Complicated queries that previously could be placed into the full-text search box should now be placed in the advanced options on the Query Builder Form. The Query Builder Form is documented in the Oracle WebCenter Content User's Guide for Content Server.

  • If you need to specify an escape character, use the configuration variable AdditionalEscapeChars=. The default setting is:

    AdditionalEscapeChars=_:#,-:#
    

    The default sets an underscore (_) and a hyphen (-) as escape characters.

  • The PDF Highlighting feature has been disabled.

  • The Spell Checking feature can be enabled, but it requires a custom component just as it did with Autonomy VDK.

7.1.4 Managing OracleTextSearch

This section covers the following topics:

7.1.4.1 Determining Fields to Optimize

Consider the following when determining the fields to optimize:

  • Do you want an exact match in a query?

  • Do you want that match to work faster in a search?

  • Do you want to sort search results by field?

By default the OracleTextSearch feature optimizes the Content ID and Document Title metadata fields.

A maximum number of 32 fields can be defined as Optimized Fields with the OracleTextSearch feature. The Content Server instance can have up to 32 Optimized Fields, which includes data, integer, standard Content Server fields like dInDate, dOutDate, and fields selected to be optimized. All Optimized Fields are SDATA fields, which by default include dDocName, dDocTitle, dDocType, and dSecurityGroup.

The display of integer fields is dynamic and depends on the Content Server system configuration.

7.1.4.2 Assigning/Editing Optimized Fields

You can select metadata Non-Optimized Fields and assign them to be Optimized Fields for search purposes, or edit Optimized Fields and make them Non-Optimized.

To assign or edit Optimized fields:

  1. Log in to the Content Server instance as system administrator.

  2. Click Administration in the navigation bar.

  3. Click Admin Applets.

  4. Click Configuration Manager, then the Information Fields tab, then Advanced Search Design.

    For more information on the Configuration Manager applet, see Oracle WebCenter Content Application Administrator's Guide for Content Server.

  5. To make a metadata field Optimized, click Edit Fields. In the Advanced Options for "metadata_field" screen, select Is Optimized.

  6. To edit an Optimized Field and make it Non-Optimized, click Edit Fields. In the Advanced Options for "metadata_field" screen, deselect Is Optimized.

  7. When you have completed moving fields, use Index Fast Rebuild in Repository Manager to update the search collection to use the new and modified fields.

Note:

The Fast Rebuild does not function if a search collection rebuild is in progress.

7.1.4.3 Performing a Fast Rebuild

The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild. A Fast Rebuild is required in the following cases:

  • Adding or removing information fields

  • Changing any Optimized Field

  • Changing an information field to be an Optimized Field

To perform a fast rebuild:

  1. Log in to the Content Server instance as system administrator.

  2. Click Administration in the navigation bar.

  3. Click Admin Applets, then Repository Manager, then the Indexer tab.

    The Repository Manager: Indexer Tab is displayed.

  4. On the Collection Rebuild Cycle Screen, click Start.

    The Indexer Rebuild Screen is displayed with a warning that rebuilding the search index is a time-consuming process. If you do not want to start a rebuild now, click Cancel; otherwise, continue with this procedure.

  5. On the Indexer Rebuild Screen, click OK.

    A Fast Rebuild of the search collection is performed.

Note:

A Fast Rebuild is not performed if a rebuild of the search collection is in progress.

Note:

The Fast Rebuild process does not create indexer counter values for Full Text, Meta Only, and Delete. To obtain indexer count statistics, you must perform a full collection rebuild.

7.1.4.4 Modifying the Fields Displayed on Search Results

The OracleTextSearch feature provides default menu options on the Search Results page (set by the Oracle Database configuration script):

DrillDownFields=dDocType, dSecurityGroup

Administrators can add one more option from the list of Optimized Fields to further customize the search results. Edit the configuration to add the option to the list of DrillDownFields. (This function does not support multi-value option lists.)

A Fast Rebuild must be performed after making any change in the DrillDownfields setting.

7.1.5 Searching with OracleTextSearch

Performing a search with OracleTextSearch is generally the same except there are no visible changes in the Search: Expanded Form other than removal of Substring as a search operator option. The default search operator is CONTAINS. Substring-based queries still work.

For details on performing searches, see Oracle WebCenter Content User's Guide for Content Server.

The following table describes the default search operators.

Operator Description Example

CONTAINS

Finds content items with the specified whole word or phrase in the metadata field.

This is available only for OracleTextSearch, or for Oracle Database and Microsoft SQL Server database with the optional DBSearchContainsOpSupport component enabled.

When form is entered in the Title field, the search returns items with the word form in their title, but does not return items with the word performance or reform.

MATCHES

Finds items with the exact specified value in the metadata field.

When address change form is entered in the Title field, the search returns items with the exact title of address change form.

A query that uses the MATCHES operator on a nonoptimized field behaves the same as a query that uses the CONTAINS operator.

For example, if the xDepartment field is not optimized, then the query xDepartment MATCHES 'Marketing' behaves like xDepartment CONTAINS 'Marketing', returning hits on documents that have an xDepartment value of 'Marketing Services' or 'Product Marketing'.

HAS WORD PREFIX

Finds all content items with the specified word at the beginning of the metadata field. No wildcard character is placed before or after the specified value.

When form is entered in the Title field, the search returns all items with the word form at the beginning of their title, but does not return an item whose title begins with the word performance or reform.


7.1.6 Metadata Wildcards

The following wildcards can be used in metadata search fields, even when using the Quick Search field.

  • An asterisk (*) indicates zero or many alphanumeric characters. For example:

    • form* matches form and formula

    • *orm matches form and reform

    • *form* matches form, formula, reform, and performance

  • A question mark (?) indicates one alphanumeric character. For example:

    • form? matches forms and form1, but not form or formal

    • ??form matches reform but not perform

Note:

If you want to search for an asterisk (*) or a question mark (?) without treating it as wildcard, you need to put quotation marks around your search term; for example: "here*"

7.1.7 Search Results with OracleTextSearch

When users run a search using the Search: Expanded Form, the Search Results page displays an additional menu bar with options that enable users to selectively view search results. The options represent categories used to filter the search results. The options can be context-sensitive, so if only one content item is returned for an option, then it shows only the one result in the menu itself, as shown in Figure 7-1. The default set of options include Content Type, Security Group, and Account.

Note:

Two default menu options on the OracleTextSearch menu for Search Results can be replaced by customized menu options: Security Group and Document Type.

Figure 7-1 Search results with OracleTextSearch default menu

Description of Figure 7-1 follows
Description of "Figure 7-1 Search results with OracleTextSearch default menu"

If more than one content item is found for an option, an arrow is displayed next to the option name. When you move your cursor over the option name, a menu displays the list of the categories found in the search results for that option and the number of content items for each of the categories. You can click any category name on the menu to change the Search Results page to list only those items that match the category

Figure 7-2 shows a list of categories under Security Group and the number of items found in each category.

Figure 7-2 Search results with snippets display and expanded OracleTextSearch menu

Description of Figure 7-2 follows
Description of "Figure 7-2 Search results with snippets display and expanded OracleTextSearch menu"

Element Description

Filter by Category

Displays the categories used to filter the search results, for example: Content Type, Security Group, Account.

Content Type

(Default) Lists the types and the number of each type of content items in the search results.

Clicking one of the content type names changes the Search Results to show only those items that match the content type.

Security Group

(Default) Lists the security groups and number of content items assigned to each group in the search results. Security groups include: Administration, Public, and Secure.

Clicking one of the security group names changes the Search Results to show only those items that match the security group.

Account

(Default) Lists the account types and number of items assigned to each account in the search results.

Clicking one of the account types changes the Search Results to show only those content items that match the account.


7.2 Oracle Secure Enterprise Search

Oracle Secure Enterprise Search (Oracle SES) 11g enables a secure, high quality, easy-to-use search across all enterprise information assets. If you have a license to use Oracle SES 11g, then you can configure Oracle WebCenter Content (WebCenter Content) to use Oracle SES as follows:

For more information about Oracle SES, see Oracle Secure Enterprise Search Administrator's Guide.

7.2.1 Using Oracle SES as an External Full-Text Search Engine

WebCenter Content can be configured with the OracleTextSearch feature to use Oracle Secure Enterprise Search (Oracle SES) 11g as its back-end search engine. With this configuration, users can search multiple Content Server instances for a file.

7.2.1.1 Configuring Oracle SES for Use with OracleTextSearch

To configure Oracle SES for use with the OracleTextSearch option, complete the following procedure.

Note:

If you are already using a search engine other than Oracle SES with WebCenter Content, such as the engine set up on the Content Server post configuration page, and you want to change the search engine to Oracle SES, then you must create a new database provider and configure Oracle SES for using that provider. See Section 7.2.1.2, "Reconfiguring the Search Engine to Use Oracle SES with OracleTextSearch."

  1. After installing Oracle SES, edit the file ORACLE_HOME/network/admin/sqlnet.ora to comment out the following two lines:

    tcp.invited_nodes
    tcp.validate_checking
    
  2. If Oracle SES is running, shut it down (mid-tier and database):

    ORACLE_HOME/bin/searchctl stopall
    
  3. Start the database:

    ORACLE_HOME/bin/searchctl start_backend
    
  4. Find database connection information for later use in the following file:

    ORACLE_HOME/search/webapp/config/search.properties
    
  5. Run the Repository Creation Utility (RCU) against Oracle SES and create the OCSEARCH schema. OCSEARCH sets only the search portion of a database already set up by RCU with Oracle SES.

    To create this schema, select Content Server 11g - Search Only on the RCU Select Components screen. For more information about running RCU, see Oracle WebCenter Content Installation Guide.

  6. Perform a standard WebCenter Content installation and Content Server installation.

    Note:

    Do not complete the steps on the Content Server post configuration page, because the page sets up a regular database configuration. For instructions on performing the Oracle WebCenter Content installation and Content Server configuration, see Oracle WebCenter Content Installation Guide.

  7. Create a new Data Source (WLS DataSource) on the Oracle WebLogic Server instance to connect to Oracle SES.

    1. On the Oracle WebLogic Server Administration Console, use the Services menu to choose JDBC then Data Sources. A screen listing the Summary of JDBC Data Sources is displayed.

    2. Click New and enter values for the following items on the Create a New Data Source screen:

      Name: Enter the new Data Source name.

      JNDI Name: Enter the new name again

      Database Type: Enter Oracle.

      Database Driver: Click *Oracle's Driver (Thin XA) for Instance Connection.

    3. Click Next to see the Transaction Options.

    4. Click Next to enter the database parameters. As mentioned in step 4, you can find database connection information in the search.properties file.

      Database Name: Enter the name of the database to connect to; for example, ses.

      Host Name: Enter the IP address of the database server.

      Port: Enter the database server port number for the database connection.

      Database User Name: Enter the database account user name. This is the SchemaOwner name you specified in the RCU creation process.

      Password: Enter the database account password to use to create database connections. This is the password you specified in the RCU creation process.

      Confirm Password: Enter the database account password again.

    5. Click Next.

    6. Click Test Configuration. Verify that the message "Connection test succeeded" appears at the top of the page, then click Next.

    7. From the list of available target servers, select the target Content Server checkbox to deploy the new JDBC Data Source. For example, a target Content Server might be named UCM_server1.

    8. Click Finish.

  8. On the Content Server post configuration page, click Select External in Full Text Search options, then enter the Data Source name.

  9. Restart the Content Server instance.

7.2.1.2 Reconfiguring the Search Engine to Use Oracle SES with OracleTextSearch

If you are already using a search engine other than Oracle SES with WebCenter Content (such as the engine set up on the Content Server post configuration page), and you want to change the search engine to Oracle SES, then you must create a new database provider and configure Oracle SES for Content Server using that provider.

  1. After installing Oracle SES, edit the file ORACLE_HOME/network/admin/sqlnet.ora to comment out the following two lines:

    tcp.invited_nodes
    tcp.validate_checking
    
  2. If Oracle SES is running, shut it down (mid-tier and database):

    ORACLE_HOME/bin/searchctl stopall
    
  3. Start the database:

    ORACLE_HOME/bin/searchctl start_backend
    
  4. Find database connection information for later use in the following file:

    ORACLE_HOME/search/webapp/config/search.properties
    
  5. Run the Oracle Repository Creation Utility (RCU) against Oracle SES and create the OCSEARCH schema. OCSEARCH sets only the search portion of a database already set up by RCU with Oracle SES.

    To create this schema, select Content Server 11g - Search Only on the RCU Select Components screen. For more information about running RCU, see "Creating Oracle WebCenter Content Schemas" in Oracle WebCenter Content Installation Guide.

  6. Create a new Data Source (WLS DataSource) on the Content Server instance to connect to Oracle SES.

    1. On the Oracle WebLogic Server Administration Console, use the Services menu to choose JDBC then Data Sources. A screen listing the Summary of JDBC Data Sources is displayed.

    2. Click New and enter values for the following items on the Create a New Data Source screen:

      Name: Enter the new Data Source name: ExternalSearchProvider

      JNDI Name: Enter the new name again

      Database Type: Enter Oracle.

      Database Driver: Click *Oracle's Driver (Thin XA) for Instance Connection.

    3. Click Next to see the Transaction Options.

    4. Click Next and enter the database parameters. As mentioned in step 4, you can find database connection information in the search.properties file.

      Database Name: Enter the name of the database to connect to; for example, SES.

      Host Name: Enter the IP address of the database server.

      Port: Enter the database server port number for the database connection.

      Database User Name: Enter the database account user name. This is the SchemaOwner name you specified in the RCU creation process.

      Password: Enter the database account password to use to create database connections. This is the password you specified in the RCU creation process.

      Confirm Password: Enter the database account password again.

    5. Click Next.

    6. Click Test Configuration. Verify that the message "Connection test succeeded" appears at the top of the page, then click Next.

    7. From a list of available target servers, select the target Content Server checkbox to deploy the new JDBC Data Source. For example, a target Content Server might be named UCM_server1.

    8. Click Finish.

      Note:

      You do not have to restart the Oracle WebLogic Server instance.

  7. Change the search (database) provider in Content Server:

    1. Log in to the Content Server instance.

    2. Choose Administration then Providers.

    3. Click Add in the row to create a new database provider.

    4. Enter or verify the new database provider settings.

      Provider Name: ExternalSearchProvider.

      Provider Description: External Database Provider

      Provider Class: intradoc.jdbc.JdbcWorkspace

      Connection Class: intradoc.jdbc.JdbcConnection

      Database Type: Select ORACLE.

      Use Data Source: Check this box.

      data source: Enter the name of your Data Source; for example, SES.

      Test Query: Enter a test query; for example, select * from SES.IDCTEXT

      Number of Connections: By default, this is set to 5.

      Extra Storage Keys: By default, this is set to system.

    5. Click Add.

    6. Restart the Content Server instance. The new database provider name should be included in the list displayed on the Providers screen.

  8. On the Content Server interface, choose Administration, then Admin Server, then General Configuration.

  9. In the Additional Configuration Variables section for General Configuration , enter or verify the following settings:

    SearchIndexerEngineName=OracleTextSearch

    IndexerDatabaseProviderName=ExternalSearchProvider

  10. Restart the Content Server instance.

  11. Rebuild the search index using the Repository Manager applet.

    For more information on the Repository Manager, see Oracle WebCenter Content Application Administrator's Guide for Content Server.

7.2.2 Using SESCrawlerExport for Oracle SES to Search Content Server Content

The Content Server SESCrawlerExport component adds functionality as a RSS feed generator to the Content Server instance and enables it to be searched by Oracle Secure Enterprise Search (Oracle SES). The component generates a snapshot of content currently on the Content Server instance and provides it to the Oracle SES Crawler.

The SESCrawlerExport component generates RSS feeds as XML files from its internal indexer, based on indexer activity. The component can access the original WebCenter Content content (for example, a Microsoft Word document), the web-viewable rendition, and all the metadata associated with each document. The component also has a template containing an Idoc script that applies the metadata values from the indexer to generate the XML document.

SESCrawlerExport generates RSS feeds for all documents for the initial crawl, as well as feeds for updated and deleted documents for the incremental crawl. Each document can be an item in the feed, together with the operation on the item (for example: insert, delete, update), its metadata (for example: author, summary), URL links, and so on. The indexer wakes up periodically (around 30 seconds) and creates a data feed for the documents that were changed.

The Content Server connector for Oracle SES reads the feeds provided by SESCrawlerExport according to the crawling schedule. Oracle SES parses, extracts the metadata information, and fetches the document content using its generic RSS crawler framework.

The YahooUserInterfaceLibrary component must be enabled on the Content Server instance. This component has JavaScript libraries that SESCrawlerExport users during the initial crawl to report the status of the feed generation

Note:

The SESCrawlerExport component is not affected by what search engine is used in the Content Server instance. SESCrawlerExport does not affect how Oracle SES performs searches.

7.2.2.1 Accessing the SESCrawlerExport Component

To access the SESCrawlerExport component:

  1. Log in to the Content Server instance.

  2. Choose Administration then Admin Server.

  3. On the Component Manager page, from the list of Integration components, select SESCrawlerExport.

  4. Click Update.

    The SESCrawlerExport component is enabled.

  5. Choose Administration then SESCrawlerExport to display the SESCrawlerExport Administration page. Use this page to take a snapshot of content to generate RSS feeds and to access the Configure SESCrawlerExport page.

7.2.2.2 Taking a Snapshot of Content Server Content

Taking a snapshot of content on the Content Server instance generates feeds to be provided to Oracle SES Crawler. The snapshot generates a configFile.xml at the location specified by the SESCrawlerExport component FeedLoc parameter. XML feeds are created in the subdirectory with the source name; for example, wikis. Performing a snapshot can take some time depending on the number of items you have stored on the Content Server instance and how many sources you are generating.

To take a snapshot:

  1. Choose Administration then SESCrawlerExport.

  2. On the SES Crawler Export Administration page, select the source or sources you want to capture in the snapshot from the available menu options.

    If you select All Sources from the list of content sources, SESCrawlerExport generates RSS feeds for all defined sources. You can also choose to select individual sources or select a subset of sources to take a snapshot of just those sources. Any update on the configFile.xml document that causes reindexing to occur also generates the feeds in the same location.

  3. Click Take Snapshot.

    Note:

    The configFile.xml file is generated once for the same configuration, either on the initial snapshot or on the first update of any document, whichever occurs first.

7.2.2.3 Configuring SESCrawlerExport Parameters

The SESCrawlerExport component has several parameters you can configure to specify the data feed source, content, metadata, the number of items per data feed, and so forth. Changes to parameters take effect immediately; however, you may need to retake a new snapshot to propagate the changes.

To configure these parameters:

  1. Choose Administration then SESCrawlerExport.

  2. On the SES Crawler Export Administration page, click Configure SESCrawlerExport.

    The configuration page is displayed.

  3. Specify or confirm values for the following SESCrawlerExport parameter fields.

Element Description

Hostname

(sceHostname)

The string for the hostname of the Content Server instance that hosts the content to be exported. If the value is blank, the hostname is set to the host that performs the Oracle SES export. This field is Idoc capable.

Feed Location

(sceFeedLoc)

Directory to which the configuration file and data feeds are written. The configFile.xml file is generated at this location. Data feeds and content are generated in the subdirectory with the Source Name from this location.

Metadata List

(sceMetadataList)

A comma-delineated list of metadata values that are exported to Oracle SES. If the value is blank, the list of metadata values consists of the following fields: dID,dDocName,dRevLabel,dDocType,dDocAccount,dSecurityGroup,dOriginalName,dReleaseDate,dOutDate and all custom metadata fields (those beginning with the letter "x").

If this field is filled with a set of metadata fields, only those fields are exported to Oracle SES. These fields can be standard or custom metadata fields.

Admin Email(s)

(sceAdminEmail)

A comma-delineated list of e-mail addresses, user names, and user aliases that are notified by e-mail when crawling errors occur.

Custom Metadata Blacklist

(sceCustomMetadataBlacklist)

A comma-delineated list of metadata values that are not exported to Oracle SES. These fields can be standard or custom metadata fields.

Maximum Feeds Pending Consumption by SES per Source (sceMaxFeedsPerSource)

A number that limits the creation of new datafeeds if the datafeeds for each source that are pending consumption by SES exceeds the specified value.

To limit the feeds, this number must be set to 0 or a positive value. If this number is set to a negative value, there is no limit on the feeds generated.

Maximum Items Per Datafeed

(sceMaxItems)

The maximum number of content items for each data feed. (A content item in the feed is an operation. For example: insert, update, or delete a document.)

Core Filter

(sceCoreFilter)

Performs some pre-filtering on content to remove them from being exported to Oracle SES. Oracle recommends that you leave this value at the default setting.

Crawler Role

(sceCrawlerRole)

The Content Server role required for the account that Oracle SES uses to crawl the Content Server instance. By default, the Content Server admin role is required.

Caution: Do not use the default Oracle WebLogic Server administrator account to crawl from Oracle SES. Instead use either an administrator account from an external source (such as an LDAP provider) or the local Content Server account. If necessary, you can change the required role admin to another role, using this SESCrawlerExport field. For example:

  1. On the Content Server instance, create a new role called scecrawlerrole.

  2. Create a new local user account called sescrawler and assign the role scecrawlerrole to this user account.

  3. On Oracle SES, change your source definition to use the sescrawler account to crawl the Content Server instance.

  4. On the Content Server instance, add sceCrawlerRole=sescrawlerrole in the config.cfg file.

Source Name(s)

(sceSourceName)

A comma-delineated list of all content sources created on the Content Serverinstance. Each listed source is completely identical (mirrored). By having multiple sources, the content on this instance can be independently consumed by multiple Oracle SES servers.

These source names are used as the subdirectory names for the Feed Location directory to hold data feeds and contents.

Note: The name "ssSource" is a reserved source name and must not be used in this field.

Disable Secure APIs

(sceDisableSecureAPIs)

A Boolean flag that determines if the security for the services provided by the SESCrawlerExport component are done internally (false) or by the Content Server (true) natively. For more information see "Section 7.2.2.3.2, "Configuring a Content Server Source with Oracle Single Sign-On."


7.2.2.3.1 Configuring a Content Server Source in Oracle SES

The Content Server connector enables Oracle SES to search the Content Server instance in WebCenter Content. The connector reads the feeds provided by the Content Server instance according to a crawling schedule. To crawl data from Oracle SES, you must create a source of type Content Server. For detailed instructions on installing the connector patch and creating the Content Server source, see Oracle Secure Enterprise Search Administrator's Guide.

The following parameters are used in setting up the Content Server source:

  • Configuration URL:

    http://host_name/instance_name/idcplg?IdcService=SES_CRAWLER_DOWNLOAD_CONFIG&source=source_name
    

    The parameter represented by source_name must be equal to one of the strings used in SESCrawlerExport component Source Name (sceSourceName) parameter. This parameter points to one of the content sources on the Content Server instance. For example:

    http://stahz16/ucm/idcplg?IdcService=SES_CRAWLER_DOWNLOAD_CONFIG&source=cs
    
  • HTTP endpoint for authentication and authorization: You are prompted for the HTTP endpoint values during the WebCenter Content identity plug-in activation and authorization manager configuration. The two values are usually the same on the same Content Server instance and are usually in the form of http://host_name/instance_name/idcplg. For example, http://host.example.com/ucm/idcplg. This value is used as the endpoint for any service call to Content Server instance. You can also find the value by choosing Administration, then Admin Server, then Internet Configuration. Use the current URL (without URL parameter) as the HTTP endpoint.

7.2.2.3.2 Configuring a Content Server Source with Oracle Single Sign-On

When the Content Server instance is secured with Oracle Single Sign-On (OSSO), the SESCrawlerExport component configuration must be changed to allow Oracle SES access to the services provided by SESCrawlerExport. Go to the Configure SESCrawlerExport page to disable the internal security mechanisms by setting the Disable Secure APIs parameter to true.

7.2.2.3.3 Configuring a Content Server Source with Other Single Sign-On Solutions

When the Content Server instance is secured with a single sign-on solution other than Oracle Single Sign-On (OSSO), some changes must be made to allow Oracle SES access to the services provided by the SESCrawlerExport component.

  • Configuration: When using a single sign-on solution other than Oracle Single Sign-On, the security for the services provided by the SESCrawlerExport component are provided by the component itself. Go to the Configure SESCrawlerExport page to enable the internal SESCrawlerExport security mechanisms by setting the Disable Secure APIs parameter to false.

  • Web Server: Access to the services provided by the SESCrawlerExport component must bypass single sign-on because Oracle SES is not compatible with the single sign-on solutions. Depending on the selected single sign-on solution, creating a bypass might be as simple as configuring a web server module to allow access to a subset of services.

    If you set up an additional web server on the Content Server instance, the Web server must run on a different port than the standard Content Server port (that is, something other than port 80). Configure this additional Web server to not have any single sign-on protection at all. Also, set up Access Control Lists to allow only Oracle SES access to this Web server. In the Oracle SES configuration, use this additional Web server port in the configuration URLs for the Content Server source.

7.2.2.4 Configuring the Content Server Source Location Script

The Content Server source location script is a fully customizable Idoc script that evaluates against a content item's metadata and returns the source(s) to which this content item should be set.

To access the page where you can create or update the source location script:

  1. Choose Administration then SESCrawlerExport.

  2. On the SES Crawler Export Administration page, click Configure SESCrawlerExport.

    The configuration page is displayed.

  3. On the Configure SESCrawlerExport page, click Configure Source Location Script.

    The Configure Source Location Script page is displayed.

  4. Enter the Idoc script in the provided area.

    By default, the source location script is set to #all, which sends every content item flagged as Latest Released to all sources (see the Source Name parameter) configured on the Content Server instance. The #all source name is a reserved keyword that indicates that all sources receive the content item.

    Similarly, the #none source name is also a reserved keyword, but it indicates that the content item should be sent to no sources (basically, the content item is not exported to Oracle SES).

  5. Click Update.

    If you want to remove the source location script, then click Reset.

  6. To test the source location script, enter a content item's Document Name (dDocName) in the field provided, then click Test.

    If there are syntax errors in the script, the errors are either displayed on the page or in the server output, depending on the type of syntax error. Logic errors can be corrected on the SESCrawlerExport Source Location Script page and the test can be run again immediately.

    If the script returns a source name that does not exist, an error is generated in the server output. The invalid source name is removed and the item(s) continue to be processed, but it is recorded in the logs. You can correct this problem either by removing the source name from the script or by adding a new Source Name parameter value for your Content Server instance.

    You can return multiple source names in the script by separating them with commas.

Example

In the following example, the source location script is set up to send all content items that have a Document Type (dDocType) of ADACCT into a source named accounting, and everything else falls into the source named default. The accounting and default sources must be set up separately by adding those names into the Source Name parameter on the Configure SESCrawlerExport page.

<$if dDocType like "ADACCT" $>
accounting
<$else$>
default
<$endif$>

7.3 Full-Text Database Search

Use the following procedure to set up and use full-text database searching and indexing for SQL Server and other databases.

  1. Install the Content Server and configure it to work with the database.

  2. Add the following entry to the DomainHomeName\ucm\cs\config\config.cfg file and save the file:

    SearchIndexerEngineName=DATABASE.FULLTEXT
    
  3. Restart the Content Server.

  4. Rebuild the search index using the Repository Manager.