10 Managing Search Features

This chapter describes how to configure the OracleTextSearch feature to use Oracle Text as the primary full-text search engine for Oracle WebCenter Content, how to configure Content Server to use Oracle Search Enterprise Search (SES), and how to configure full-text database searching.

Also, this chapter describes how to configure Elasticsearch which has reduced rebuild time significantly.

This chapter covers the following topics:

10.1 Managing OracleTextSearch

If you have a license to use the OracleTextSearch feature with Oracle Database 12c, then you can configure OracleTextSearch to use the Oracle Text product as the primary full-text search engine for WebCenter Content. Oracle Text offers state-of-the-art indexing capabilities and provides the underlying search capabilities for Oracle Secure Enterprise Search (Oracle SES). However, Oracle Text has its own query syntax, which is intended more for use by applications or information professionals rather than casual end-users.

OracleTextSearch enables administrators to specify certain metadata fields to be optimized for the search index and to customize additional fields. This feature also enables a fast index rebuild and index optimization.

This section covers the following topics:

10.1.1 Considerations for Using OracleTextSearch

The following items are important when considering use of the OracleTextSearch feature:

  • WebCenter Content version 12c supports all languages supported by Oracle Text. OracleTextSearch can filter and extract content from different document formats in different languages. It supports a large number of document formats, including Microsoft Office file formats, Adobe PDF, HTML, and XML. It also supports archive and compression file formats such as zip, zipx, and gz. It can render search results in various formats, including unformatted text, HTML with term highlighting, and original document format.

  • Oracle Text runs on Oracle Database 12c. The Content Server database can be Oracle Database 12c, Microsoft SQL Server, or other databases as listed in the Oracle WebCenter Content 12c Certification Matrix. However, if the system database is not Oracle Database 12c, then an external provider for OracleTextSearch must be configured. For details on external providers, see Configuring OracleTextSearch for Content Server.

  • When using OracleTextSearch, Oracle Database version 11.1.0.7.0 or higher is required.

  • Optimized fields for OracleTextSearch are created as SDATA fields, which have a maximum limit of 249 characters. This limit is imposed by Oracle Database and is reflected in Content Server by the OracleTextSearch component. Default SDATA fields include dDocName, dDocTitle, dDocType, and dSecurityGroup. The total number of SDATA fields is limited to 32 fields.

  • While WebCenter Content provides numerous search options using a variety of databases (Oracle, Microsoft SQL Server, IBM DB2), by default the database that serves as the search index is the same system database used by WebCenter Content to manage metadata and other configuration information (users, security groups, and so on). The OracleTextSearch feature enables Oracle Text as a separate search collection instance on Oracle Database 12c for WebCenter Content, which allows the search collection to reside on a separate computer and not compete with WebCenter Content for processors and memory. This can improve indexing and search response time.

  • The OracleTextSearch collection instance can be installed on a different platform than the WebCenter Content installation.

  • If the OracleTextSearch feature is configured and running, and metadata fields are pushed in to the Content Server instance either by the administrator or by a component (requiring that the Content Server instance be restarted), then the OracleTextSearch index must be rebuilt before content using the new metadata fields can be checked in to the Content Server instance.

10.1.2 Oracle Text Features and Benefits

This section covers the following topics:

10.1.2.1 Indexing and Query Speeds and Techniques

Using Oracle Text, WebCenter Content offers a significant increase in index speeds. Oracle Text indexing is transactional. Content Server sends a batch of document to Oracle Text, commits the batch, then starts the Oracle Text indexer. Content Server is notified of which documents failed to index and only those documents are resubmitted to be indexed. Additional capabilities include an automatic Fast Optimization for every 5,000 documents added to the Content Server instance, and a Full Optimization for every 50,000 documents or 20% growth of the repository. Note that Content Server metadata-only search queries may degrade in performance when using Oracle Text.

WebCenter Content uses some of the newest Oracle Text features. For example, Content Server automatically creates a new search index zone for each text information field in order to provide better search speed. Using information zones enables Content Server to query data as if it were full-text data. All text-based information fields (text, long text, and memo) are automatically added to as separate zones. In addition to the zones created for text information fields, Content Server provides an extra zone named IdcContent, which enables custom components, Oracle WebCenter Content: Inbound Refinery components, applications, or users to create XML content with tags that will be indexed as full-text metadata fields.

WebCenter Content uses the SDATA section feature in Oracle Text to index important text, date, and integer fields and define them as Optimized Fields. The SDATA section is a separate XML structure managed by the Oracle Text engine that allows the engine to respond rapidly to requests involving data and integer ranges. Content Server can have up to 32 Optimized Fields, which includes data, integer, standard Content Server fields like dInDate, dOutDate, and fields selected to be optimized. All Optimized Fields are SDATA fields, which by default include dDocName, dDocTitle, dDocType, and dSecurityGroup.

Note:

If you want to change the set of Optimized Fields defined in Oracle Text, the maximum allowed number of Optimized Fields is 32.

To avoid errors when indexing, do not add non-existent metadata fields to the Configuration Manager DrillDownFields parameter, and do not add memo fields to an SDATA section or to the DrillDownFields parameter. See Understanding Management Tools in Managing Oracle WebCenter Content.

10.1.2.2 Fast Rebuild

OracleTextSearch provides an Indexer Rebuild window when you use the Collection Rebuild Cycle window on the Repository Manager application Indexer tab. The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild. A Fast Rebuild is required in the following cases:

  • Adding or removing information fields

  • Changing any Optimized Field

  • Changing an information field to be an Optimized Field

A Fast Rebuild does not cause all the information (metadata and full-text) to be re-indexed. It adds the changes throughout the collection and updates it. Content Server search functionality is not affected during a Fast Rebuild cycle.

For information on performing a fast rebuild, see Performing a Fast Rebuild.

10.1.2.3 Query Syntax

Queries defined in Universal Query Syntax are supported and generally do not need any modification. This includes queries saved by users, queries defined in custom components, and queries defined in Site Studio pages.

10.1.2.4 OracleTextSearch Operators

Oracle Text supports the following defaults:

  • CONTAINS

  • MATCHES

  • Has Word Prefix

  • Range searches for dates and integers

10.1.2.4.1 Search Thesaurus

Certain queries, such as stem and Related Term, may be more effective if you use an Oracle Text thesaurus. Oracle Text enables you to create case-sensitive or case-insensitive thesauri which define synonym and hierarchical relationships between words and phrases. You can then search and retrieve documents that contains relevant text by expanding queries to include similar or related terms as defined in the thesaurus. For example, you can populate a thesaurus with specific product names, associated models, associated features, and so forth.

  • Default thesaurus: If you do not specify a thesaurus by name in a query, by default, the thesaurus operators use a thesaurus named DEFAULT. However, Oracle Text does not provide a DEFAULT thesaurus.

    As a result, if you want to use a default thesaurus for the thesaurus operators, you must create a thesaurus named DEFAULT. You can create the thesaurus through any of the thesaurus creation methods supported by Oracle Text:

    • CTX_THES.CREATE_THESAURUS (PL/SQL)

    • ctxload utility

  • Supplied thesaurus: Oracle Text does not provide a default thesaurus, but Oracle Text does supply a thesaurus, in the form of a file that you load with ctxload, that can be used to create a general-purpose, English-language thesaurus.

    The thesaurus load file can be used to create a default thesaurus for Oracle Text, or it can be used as the basis for creating thesauri tailored to a specific subject or range of subjects.

Note:

See the Oracle Text Reference to learn more about using ctxload and the CTX_THES package, and see the chapter, "Working With a Thesaurus in Oracle Text," in the Oracle Text Application Developer's Guide.

10.1.2.5 Case Sensitivity and Stemming Rules

Content Server automatically ensures that queries are executed as case-insensitive. By default, all full-text and text field search queries are case-insensitive. Content Server also handles case-insensitive search queries for information stored as Optimized Fields.

Stemming is an Oracle Text feature that uses the stem ($) operator to search for terms that have the same linguistic root as the query term (the syntax is $term). For example, the input $sing would expand a search to include sang sung sing. Stemming rules can be used to have searches account for plurals, verbs, and so forth. Content Server does not apply any stemming rules by default for Oracle Text, but a set of stemming rules can be created by using the stem ($) operator. Other methods for implementing stemming rules include modifying the standard query definition in the searchindexerrules configuration file (which requires a custom component), and by making configuration changes in the Oracle Text engine (Oracle Database).

Note:

For more information, see the chapter "Oracle Text CONTAINS Query Operators" in the Oracle Text Reference.

Content Server handles content in non-English languages by using the WORLD_LEXER feature in the Oracle Text engine. This enables Oracle Text to automatically identify the language and apply the proper tokenization rules.

10.1.2.6 Search Results Data Clustering

With the OracleTextSearch feature, Content Server retrieves additional information about a search result list and displays it in a new menu bar on the Search Results page. This information summarizes how many documents are attached to specific values in specific information fields. Content Server supports data clustering for up to four information fields (the default fields are Security Group and Document Type).

This can be useful if you have a query that returns many items. For example, a result set could include 200 content items, including 100 documents that belong to the Public security group, 75 that belong to the Sales group, and 25 that belong to the Marketing group. The menu option for Security Group will show you the list of values and how many documents belong to each value. You can select one of the values (Public, Sales, Marketing) from the menu and it will list only those documents in the result set that belong to that value.

10.1.2.7 Snippets

Content Server can retrieve document snippets as part of search results to show the occurrence of search terms in context of their usage. This feature is disabled by default. To enable this feature, although it can affect search query performance, set the following configuration entry in the config.cfg file:

OracleTextDisableSearchSnippet=false
10.1.2.8 Additional Changes

Additional changes because of the use of Oracle Text include:

  • XML content is automatically indexed.

  • There are no visible changes in the Search user interface other than removal of Substring as a search operator option. The default search operators are CONTAINS, MATCHES, and HAS WORD PREFIX. Substring-based queries still work.

  • Queries using the MATCHES operator on a non-optimized field behave like a CONTAINS query. For example, if xDepartment is not optimized, then the query xDepartment MATCHES 'Marketing' behaves like xDepartment CONTAINS 'Marketing' and returns hits on content items that have an xDepartment value of 'Marketing Services' or 'Product Marketing'.

  • Relevancy ranking can be changed in Oracle Text through use of an operator called DEFINESCORE. This operator can be added through a component to the WhereClause value of OracleTextSearch in the SearchQueryDefinition table (in the Oracle Text searchindexerrules configuration file). More information about this operator is available in the Oracle Text Reference document.

  • Complicated queries that previously could be placed into the full-text search box should now be placed in the advanced options on the Query Builder Form. The Query Builder Form is documented in the Using Oracle WebCenter Content.

  • If you need to specify an escape character, use the configuration variable AdditionalEscapeChars=. The default setting is:

    AdditionalEscapeChars=_:#,-:#
    

    The default sets an underscore (_) and a hyphen (-) as escape characters.

  • The PDF Highlighting feature has been disabled.

  • The Spell Checking feature can be enabled, but it requires a custom component just as it did with Autonomy VDK.

10.1.3 Configuring OracleTextSearch for Content Server

If you did not specify OracleTextSearch when first installing Content Server, to configure the feature:

  1. Open the config.cfg file for the Content Server instance in a text editor. For example: MW_HOME/user_projects/domain/servers/ucm/config/config.cfg
  2. Set the following property value:
    SearchIndexerEngineName=OracleTextSearch

    Note:

    If you are using ACLs, and UseEntitySecurity=true is set with OracleTextSearch as the search engine, then the following must also be set in the config.cfg file for the Content Server instance:

    ZonedSecurityFields=xClbraUserList,xClbraAliasList
  3. If you are using an external data source instead of the system database, change the value SystemDatabase in the following property setting to the external database provider name:
    IndexerDatabaseProviderName=SystemDatabase
    

    Note:

    You can specify a separate Oracle Database as the value of IndexerDatabaseProviderName, instead of SystemDatabase.

    If the Content Server database used with OracleTextSearch is not Oracle Database, then an external provider for OracleTextSearch must be configured. Obtain the driver and fmwgenerictoken.jar from MW_HOME/oracle_common/modules/oracle.jdbc_11.1.1/ojdbc6dms.jar.

  4. Save the file.
  5. Restart the Content Server instance. For instructions, see Restarting Content Server or Inbound Refinery Using Fusion Middleware Control.
  6. Rebuild the search index.

    For more information on rebuilding the index, see Working with the Search Index. For more information on configuring Content Server and OracleTextSearch during installation, see FullText Search Option in WebCenter Content Configuration Page in Installing and Configuring Oracle WebCenter Content.

If you originally configured Content Server to use an external provider with OracleTextSearch, but later need to switch to use SystemDatabase, you must manually run the contentprocedures.sql script against your system database schema. The script file is located in the WC_CONTENT_ORACLE_HOME/ucm/idc/database/oracle/admin/ directory.

10.1.4 Managing OracleTextSearch

This section covers the following topics:

10.1.4.1 Determining Fields to Optimize

Consider the following when determining the fields to optimize:

  • Do you want an exact match in a query?

  • Do you want that match to work faster in a search?

  • Do you want to sort search results by field?

By default the OracleTextSearch feature optimizes the Content ID and Document Title metadata fields.

A maximum number of 32 fields can be defined as Optimized Fields with the OracleTextSearch feature. The Content Server instance can have up to 32 Optimized Fields, which includes data, integer, standard Content Server fields like dInDate, dOutDate, and fields selected to be optimized. All Optimized Fields are SDATA fields, which by default include dDocName, dDocTitle, dDocType, and dSecurityGroup.

The display of integer fields is dynamic and depends on the Content Server configuration.

10.1.4.2 Assigning/Editing Optimized Fields

You can select metadata Non-Optimized Fields and assign them to be Optimized Fields for search purposes, or edit Optimized Fields and make them Non-Optimized.

To assign or edit Optimized fields:

  1. Choose Administration, then Admin Applets.
  2. Select Configuration Manager, then the Information Fields tab, then Advanced Search Design. For more information on the Configuration Manager applet, see Exporting Auxiliary Metadata Sets in Managing Oracle WebCenter Content.
  3. To make a metadata field Optimized, click Edit Fields. In the Advanced Options for "metadata_field" window, select Is Optimized.
  4. To edit an Optimized Field and make it Non-Optimized, click Edit Fields. In the Advanced Options for "metadata_field" window, deselect Is Optimized.
  5. When you have completed moving fields, use Index Fast Rebuild in Repository Manager to update the search collection to use the new and modified fields.

Note:

The Fast Rebuild does not function if a search collection rebuild is in progress.

10.1.4.3 Performing a Fast Rebuild

The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild. A Fast Rebuild is required in the following cases:

  • Adding or removing information fields

  • Changing any Optimized Field

  • Changing an information field to be an Optimized Field

To perform a fast rebuild:

  1. Choose Administration, then Admin Applets.
  2. Choose Repository Manager, then select the Indexer tab.
  3. In the Collection Rebuild Cycle part of the Repository Manager application Indexer tab, click Start.

    The Indexer Rebuild window opens with a warning that rebuilding the search index is a time-consuming process. If you do not want to start a rebuild now, click Cancel; otherwise, continue with this procedure.

  4. In the Indexer Rebuild window, click OK.

    A Fast Rebuild of the search collection is performed.

Note:

A Fast Rebuild is not performed if a rebuild of the search collection is in progress.

Note:

The Fast Rebuild process does not create indexer counter values for Full Text, Meta Only, and Delete. To obtain indexer count statistics, you must perform a full collection rebuild.

10.1.4.4 Modifying the Fields Displayed on Search Results

The OracleTextSearch feature provides default menu options on the Search Results page (set by the Oracle Database configuration script):

DrillDownFields=dDocType, dSecurityGroup

Administrators can add one more option from the list of Optimized Fields to further customize the search results. Edit the configuration to add the option to the list of DrillDownFields. (This function does not support multi-value option lists.)

A Fast Rebuild must be performed after making any change in the DrillDownfields setting.

10.1.5 Searching with OracleTextSearch

Performing a search with OracleTextSearch is generally the same except there are no visible changes in the Search: Expanded Form other than removal of Substring as a search operator option. The default search operator is CONTAINS. Substring-based queries still work.

See Searching with Oracle Text Search in Using Oracle WebCenter Content.

The following table describes the default search operators.

Operator Description Example

CONTAINS

Finds content items with the specified whole word or phrase in the metadata field.

This is available only for OracleTextSearch, or for Oracle Database and Microsoft SQL Server database with the optional DBSearchContainsOpSupport component enabled.

When form is entered in the Title field, the search returns items with the word form in their title, but does not return items with the word performance or reform.

MATCHES

Finds items with the exact specified value in the metadata field.

When address change form is entered in the Title field, the search returns items with the exact title of address change form.

A query that uses the MATCHES operator on a non-optimized field behaves the same as a query that uses the CONTAINS operator.

For example, if the xDepartment field is not optimized, then the query xDepartment MATCHES 'Marketing' behaves like xDepartment CONTAINS 'Marketing', returning hits on documents that have an xDepartment value of 'Marketing Services' or 'Product Marketing'.

HAS WORD PREFIX

Finds all content items with the specified word at the beginning of the metadata field. No wildcard character is placed before or after the specified value.

When form is entered in the Title field, the search returns all items with the word form at the beginning of their title, but does not return an item whose title begins with the word performance or reform.

Note:

We cannot use wildcards (? and *) to escape special characters when CONTAINS and HAS WORD PREFIX operators are used. For example, if we have a dDocTitle as Webcenter_Content, we cannot search with Webcenter?Content or Webcenter*Content with CONTAINS and HAS WORD PREFIX operators.

10.1.6 Using Metadata Wildcards

The following wildcards can be used in metadata search fields, even when using the Quick Search field.

  • An asterisk (*) indicates zero or many characters. For example:

    • form* matches form and formula

    • *orm matches form and reform

    • *form* matches form, formula, reform, and performance

  • A question mark (?) indicates one character. For example:

    • form? matches forms and form1, but not form or formal

    • ??form matches reform but not perform

Note:

If you want to search for an asterisk (*) or a question mark (?) without treating it as wildcard, you need to put quotation marks around your search term; for example: "here*". Wildcard (?) do not work for the Security Group metadata field in OracleTextSearch. Also, metadata field values with underscore (_) do not work with wildcard (?).

10.1.7 Using Internet-Style Search Syntax

Search techniques common to the popular Internet search engines are supported in Content Server. For example, entering new product in the Quick Search field will search for new <AND> product, while entering new, product will search for new <OR> product.

To enable this style of search, set the variable DoMetaInternetSearch=True. To disable this style of search, set the variable DoMetaInternetSearch=False. This is the default. For more information, see DoMetaInternetSearch in Configuration Reference for Oracle WebCenter Content.

The following table lists how Content Server interprets common characters.

Character Interpreted As

Space ( )

AND

Comma (,)

OR

Minus (-)

NOT

Phrases enclosed in double-quotes ("any phrase")

Exact match of entered phrase

The following table lists examples of how Content Server interprets Internet-style syntax in a full-text search.

Query Interpreted As

new product

new <AND> product

(new, product) images

(new <OR> product) <AND> images

new product -images

(new <AND> product) <AND> <NOT> images

"new product", "new images"

"new product" <OR> "new images"

The following table lists examples of how Content Server interprets Internet-style syntax when searching title metadata using the substring operator.

Query Interpreted As

new product

dDocTitle <substring> 'new' <AND> dDocTitle <substring> 'product'

new, product

dDocTitle <substring> 'new' <OR> dDocTitle <substring> 'product'

new -product

dDocTitle <substring> 'new' <AND> <NOT> 'product'

"new product"

dDocTitle <substring> 'new product'

10.1.8 Adjusting the Score on OracleTextSearch Results

When you use OracleTextSearch with Oracle Text as the search engine in WebCenter Content, the results of a search by Score are sorted based on the relevancy in documents. In theory, the more relevant the search term is to a document, the higher ranked Score it should receive. In practice, it's not entirely clear how the relevancy Score ranks the importance of some documents over others based on the search term. When a word appears a certain number of times within a document, the Score reaches a maximum at 100 and the top results can be difficult to discern from one another.

For example, if you searched for the term "vacation" in a set of documents, out of seven results, six of them might have a Score of "100" which means they are basically ranked the same. Having many documents ranked the same doesn't make the sort by Score very meaningful.

Besides sorting by relevance, you can also tell Oracle Text to sort by occurrence. Sorting by occurrence can provide a much more predictable result in how documents would be ranked, and for many cases it can provide a more meaningful sorting of results then relevance.

To tell Oracle Text to sort by occurrence you must make a small component change to the SearchOperatorMap resource. By default, the query used for full-text searching looks like the following code:

&lt;td&gt;(ORACLETEXTSEARCH)fullText&lt;/td&gt;
&lt;td&gt;DEFINESCORE((%V), <strong>RELEVANCE * .1</strong>)&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt; 

Override this resource and change it to use OCCURRENCE instead of RELEVANCE. This change forces the resource to use occurrence (also note the change in scale from .1 to .01).

&lt;td&gt;(ORACLETEXTSEARCH)fullText&lt;/td&gt;
&lt;td&gt;DEFINESCORE((%V),<strong> OCCURRENCE * .01</strong>)&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt; 

If you run the same search and sort options as mentioned in the earlier example, the results come out differently and each of the seven documents has a unique Score. This provides a clearer understanding of how the items rank. Generally, if the search term appears three times more in one document then another, it has a better chance of being a document you are interested in examining.

Note:

The occurrence ranking also has a maximum count of 100, so if a search term occurs in the document more than that count, the Score result stays at 100.

For your site, using relevance ranking may be more useful than occurrence ranking, however, this option provides an alternate method that might work better for your results.

10.1.9 Customizing Search Results with OracleTextSearch

When users run a search using the Search: Expanded Form, the Search Results page displays an additional menu bar with options that enable users to selectively view search results. The options represent categories used to filter the search results. The options can be context-sensitive, so if only one content item is returned for an option, then it shows only the one result in the menu itself, as shown in Figure 10-1. The default set of options include Content Type, Security Group, and Account.

Note:

Two default menu options on the OracleTextSearch menu for Search Results can be replaced by customized menu options: Security Group and Document Type.

Figure 10-1 Search Results with OracleTextSearch Default Menu

Description of Figure 10-1 follows
Description of "Figure 10-1 Search Results with OracleTextSearch Default Menu"

If more than one content item is found for an option, an arrow is displayed next to the option name. When you move your cursor over the option name, a menu displays the list of the categories found in the search results for that option and the number of content items for each of the categories. You can click any category name on the menu to change the Search Results page to list only those items that match the category

Figure 10-2 shows a list of categories under Security Group and the number of items found in each category.

Figure 10-2 Search Results with Snippets Display and Expanded OracleTextSearch Menu

Description of Figure 10-2 follows
Description of "Figure 10-2 Search Results with Snippets Display and Expanded OracleTextSearch Menu"
Element Description

Filter by Category

Displays the categories used to filter the search results, for example: Content Type, Security Group, Account.

Content Type

(Default) Lists the types and the number of each type of content items in the search results.

Clicking one of the content type names changes the Search Results to show only those items that match the content type.

Security Group

(Default) Lists the security groups and number of content items assigned to each group in the search results. Security groups include: Administration, Public, and Secure.

Clicking one of the security group names changes the Search Results to show only those items that match the security group.

Account

(Default) Lists the account types and number of items assigned to each account in the search results.

Clicking one of the account types changes the Search Results to show only those content items that match the account.

10.1.9.1 About Batch Load File Records

A batch load file is made up of file records, which are sets of name/value pairs that specify the action to perform, or the metadata for individual content items, or both.

Note:

Field names and parameters are case sensitive. They must appear in the batch load file exactly as they appear in the following sections. For example, dDocName is not the same as ddocname, dDocname, or DDOCNAME.

  • Each file record ends with an <<EOD>> (end of data) marker.

  • A pound sign (#) followed by a space at the beginning of a line indicates a comment. The comment character must be followed by a space. For example: # primaryFile=test.txt works properly, but #primaryFile=test.txt will cause errors.

  • The following is an example of a file record:

    # This is a comment
    Action=insert
    dDocName=Sample1
    dDocType=Document
    dDocTitle=Batch Load record insert example
    dDocAuthor=sysadmin
    dSecurityGroup=Public
    primaryFile=links.doc
    dInDate=8/15/2001
    <<EOD>>

10.2 Managing Oracle Secure Enterprise Search

Oracle Secure Enterprise Search (Oracle SES) 12c enables a secure, high quality, easy-to-use search across all enterprise information assets. If you have a license to use Oracle SES 12c, then you can configure WebCenter Content to use Oracle SES as follows:

For more information, see the Cookbook: SES and UCM Setup blog. For more information about Oracle SES, see Oracle Secure Enterprise Search Administrator's Guide.

10.2.1 Using Oracle SES as an External Full-Text Search Engine

WebCenter Content can be configured with the OracleTextSearch feature to use Oracle Secure Enterprise Search (Oracle SES) 12c as its back-end search engine. With this configuration, users can search multiple Content Server instances for a file.

By default, Oracle SES will full-text index a 10 MB file, which you can configure in Oracle SES to a maximum file size of 1 GB.

10.2.1.1 Configuring Oracle SES for Use with OracleTextSearch

To configure Oracle SES for use with the OracleTextSearch option:

Note:

If you are already using a search engine other than Oracle SES with WebCenter Content, such as the engine set up on the Content Server post-configuration page, and you want to change the search engine to Oracle SES, then you must create a new database provider and configure Oracle SES for using that provider. For more information, see Reconfiguring the Search Engine to Use Oracle SES with OracleTextSearch.

  1. After installing Oracle SES, edit the file ORACLE_HOME/network/admin/sqlnet.ora to comment out the following two lines:

    tcp.invited_nodes
    tcp.validate_checking
    
  2. If Oracle SES is running, shut it down (mid-tier and database):

    ORACLE_HOME/bin/searchctl stopall
    
  3. Start the database:

    ORACLE_HOME/bin/searchctl start_backend
    
  4. Find database connection information for later use in the following file:

    ORACLE_HOME/search/webapp/config/search.properties
    
  5. Run the Repository Creation Utility (RCU) against Oracle SES and create the OCSEARCH schema. OCSEARCH sets only the search portion of a database already set up by RCU with Oracle SES.

    To create this schema, select Content Server 12c - Search Only on the RCU Select Components window. See Navigating the RCU Screens to Create the Schemas in Installing and Configuring Oracle WebCenter Content.

  6. Perform a standard WebCenter Content installation and Content Server installation. For instructions, see Installing and Configuring Oracle WebCenter Content.

    Note:

    Do not complete the steps on the Content Server post-configuration page, because the page sets up a regular database configuration.

  7. Create a new Data Source (WLS DataSource) on the Oracle WebLogic Server instance to connect to Oracle SES.

    1. In the Oracle WebLogic Server Administration Console, use the Services menu to choose JDBC, then Data Sources.

      A window listing the Summary of JDBC Data Sources opens.

    2. Click New and enter values for the following items on the Create a New Data Source window:

      Name: Enter the new Data Source name.

      JNDI Name: Enter the new name again

      Database Type: Enter Oracle.

      Database Driver: Click *Oracle's Driver (Thin XA) for Instance Connection.

    3. Click Next to see the Transaction Options.

    4. Click Next to enter the database parameters. As mentioned in step 4, you can find database connection information in the search.properties file.

      Database Name: Enter the name of the database to connect to; for example, ses.

      Host Name: Enter the IP address of the database server.

      Port: Enter the database server port number for the database connection.

      Database User Name: Enter the database account user name. This is the SchemaOwner name you specified in the RCU creation process.

      Password: Enter the database account password to use to create database connections. This is the password you specified in the RCU creation process.

      Confirm Password: Enter the database account password again.

    5. Click Next.

    6. Click Test Configuration. Verify that the message "Connection test succeeded" appears at the top of the page, then click Next.

    7. From the list of available target servers, select the target Content Server check box to deploy the new JDBC Data Source. For example, a target Content Server might be named UCM_server1.

    8. Click Finish.

  8. In the Content Server post-configuration page, click Select External in Full Text Search options, then enter the Data Source name.

  9. Restart the Content Server instance. For instructions, see Restarting Content Server or Inbound Refinery Using Fusion Middleware Control.

10.2.1.2 Reconfiguring the Search Engine to Use Oracle SES with OracleTextSearch

If you are already using a search engine other than Oracle SES with WebCenter Content (such as the engine set up on the Content Server post-configuration page), and you want to change the search engine to Oracle SES, then you must create a new database provider and configure Oracle SES for Content Server using that provider.

  1. After installing Oracle SES, edit the file ORACLE_HOME/network/admin/sqlnet.ora to comment out the following two lines:

    tcp.invited_nodes
    tcp.validate_checking
    
  2. If Oracle SES is running, shut it down (mid-tier and database):

    ORACLE_HOME/bin/searchctl stopall
    
  3. Start the database:

    ORACLE_HOME/bin/searchctl start_backend
    
  4. Find database connection information for later use in the following file:

    ORACLE_HOME/search/webapp/config/search.properties
    
  5. Run the Oracle Repository Creation Utility (RCU) against Oracle SES and create the OCSEARCH schema. OCSEARCH sets only the search portion of a database already set up by RCU with Oracle SES.

    To create this schema, select Content Server 12c - Search Only on the RCU Select Components window.

    For more information about running RCU, see Installing and Configuring Oracle WebCenter Content.

  6. Create a new Data Source (WLS DataSource) on the Content Server instance to connect to Oracle SES.

    1. In the Oracle WebLogic Server Administration Console, use the Services menu to choose JDBC, then Data Sources.

      A window listing the Summary of JDBC Data Sources opens.

    2. Click New and enter values for the following items on the Create a New Data Source window:

      Name: Enter the new Data Source name: ExternalSearchProvider

      JNDI Name: Enter the new name again

      Database Type: Enter Oracle.

      Database Driver: Click *Oracle's Driver (Thin XA) for Instance Connection.

    3. Click Next to see the Transaction Options.

    4. Click Next and enter the database parameters. As mentioned in step 4, you can find database connection information in the search.properties file.

      Database Name: Enter the name of the database to connect to; for example, SES.

      Host Name: Enter the IP address of the database server.

      Port: Enter the database server port number for the database connection.

      Database User Name: Enter the database account user name. This is the SchemaOwner name you specified in the RCU creation process.

      Password: Enter the database account password to use to create database connections. This is the password you specified in the RCU creation process.

      Confirm Password: Enter the database account password again.

    5. Click Next.

    6. Click Test Configuration. Verify that the message "Connection test succeeded" appears at the top of the page, then click Next.

    7. From a list of available target servers, select the target Content Server check box to deploy the new JDBC Data Source. For example, a target Content Server instance might be named UCM_server1.

    8. Click Finish.

      Note:

      You do not have to restart the Oracle WebLogic Server instance.

  7. Change the search (database) provider in Content Server:

    1. Choose Administration, then Providers.

    2. Click Add in the row to create a new database provider.

    3. Enter or verify the new database provider settings:

      Provider Name: ExternalSearchProvider.

      Provider Description: External Database Provider

      Provider Class: intradoc.jdbc.JdbcWorkspace

      Connection Class: intradoc.jdbc.JdbcConnection

      Database Type: Select ORACLE.

      Use Data Source: Check this box.

      data source: Enter the name of your Data Source; for example, SES.

      Test Query: Enter a test query; for example, select * from SES.IDCTEXT

      Number of Connections: By default, this is set to 5.

      Extra Storage Keys: By default, this is set to system.

    4. Click Add.

    5. Restart the Content Server instance. For instructions, see Restarting Content Server or Inbound Refinery Using Fusion Middleware Control.

      The new database provider name should be included in the list displayed on the Providers page.

  8. Choose Administration, then Admin Server, then General Configuration.

  9. In the Additional Configuration Variables section for General Configuration, enter or verify the following settings:

    SearchIndexerEngineName=OracleTextSearch

    IndexerDatabaseProviderName=ExternalSearchProvider

  10. Restart the Content Server instance. For instructions, see Restarting Content Server or Inbound Refinery Using Fusion Middleware Control.

  11. Rebuild the search index using the Repository Manager applet.

    See Starting the Repository Manager in Managing Oracle WebCenter Content.

10.2.2 Using SESCrawlerExport for Oracle SES to Search Content Server Content

The SESCrawlerExport component adds functionality as a RSS feed generator to the Content Server instance and enables it to be searched by Oracle Secure Enterprise Search (Oracle SES). The component generates a snapshot of content currently on the Content Server instance and provides it to the Oracle SES Crawler.

The SESCrawlerExport component generates RSS feeds as XML files from its internal indexer, based on indexer activity. The component can access the original WebCenter Content content (for example, a Microsoft Word document), the web-viewable rendition, and all the metadata associated with each document. The component also has a template containing an Idoc script that applies the metadata values from the indexer to generate the XML document.

SESCrawlerExport generates RSS feeds for all documents for the initial crawl, as well as feeds for updated and deleted documents for the incremental crawl. Each document can be an item in the feed, together with the operation on the item (for example: insert, delete, update), its metadata (for example: author, summary), URL links, and so on. The indexer wakes up periodically (around 30 seconds) and creates a data feed for the documents that were changed.

The Content Server connector for Oracle SES reads the feeds provided by SESCrawlerExport according to the crawling schedule. Oracle SES parses, extracts the metadata information, and fetches the document content using its generic RSS crawler framework.

The SESCrawlerExport component is not affected by what search engine is used in the Content Server instance. SESCrawlerExport does not affect how Oracle SES performs searches.

Note:

The YahooUserInterfaceLibrary component must be enabled on the Content Server instance. This component has JavaScript libraries that SESCrawlerExport users during the initial crawl to report the status of the feed generation.

Note:

By default, SESCrawlerExport does not support snapshots of DigitalMedia document types, and such a document will not be found with an SES search. The sceCoreFilter configuration parameter in the SESExportCrawler administration page acts as a pre-filter to the source location script and filters out any DigitalMedia content before it is sent to the sceSourceLocation script. The default parameter setting for sceCoreFilter is:

<$if dDocType and dDocType like 'DigitalMedia'$>#none<$else$>#customScript#<$endif$>

To allow DigitalMedia document types by having the core filtering defer to sceSourceLocationScript, change the default sceCoreFilter configuration parameter to #customScript#

This section covers the following topics:

10.2.2.1 Accessing the SESCrawlerExport Component

To access the SESCrawlerExport component:

  1. Choose Administration, then Admin Server, then Component Manager.
  2. In the Component Manager page, from the list of Integration components select SESCrawlerExport.
  3. Click Update.

    The SESCrawlerExport component is enabled.

  4. Choose Administration, then SESCrawlerExport to open the SESCrawlerExport Administration page. Use this page to take a snapshot of content to generate RSS feeds and to access the Configure SESCrawlerExport page.
10.2.2.2 Taking a Snapshot of Content

Taking a snapshot of content on the Content Server instance generates feeds to be provided to Oracle SES Crawler. The snapshot generates a configFile.xml at the location specified by the SESCrawlerExport component FeedLoc parameter. XML feeds are created in the subdirectory with the source name; for example, wikis. Performing a snapshot can take some time depending on the number of items you have stored on the Content Server instance and how many sources you are generating.

To take a snapshot:

  1. Choose Administration, then SESCrawlerExport.
  2. In the SES Crawler Export Administration page, select the source or sources you want to capture in the snapshot from the available menu options.

    If you select All Sources from the list of content sources, SESCrawlerExport generates RSS feeds for all defined sources. You can also choose to select individual sources or select a subset of sources to take a snapshot of just those sources. Any update on the configFile.xml document that causes reindexing to occur also generates the feeds in the same location.

  3. Click Take Snapshot.

    Note:

    The configFile.xml file is generated once for the same configuration, either on the initial snapshot or on the first update of any document, whichever occurs first.

10.2.2.3 Configuring SESCrawlerExport Parameters

The SESCrawlerExport component has several parameters you can configure to specify the data feed source, content, metadata, the number of items per data feed, and so forth. Changes to parameters take effect immediately; however, you may need to retake a new snapshot to propagate the changes.

To configure these parameters:

  1. Choose Administration, then SESCrawlerExport.

  2. In the SES Crawler Export Administration page, click Configure SESCrawlerExport.

  3. Specify or confirm values for the following SESCrawlerExport parameter fields.

Element Description

Hostname

(sceHostname)

The string for the hostname of the Content Server instance that hosts the content to be exported. If the value is blank, the hostname is set to the host that performs the Oracle SES export. This field is Idoc capable.

Feed Location

(sceFeedLoc)

Directory to which the configuration file and data feeds are written. The configFile.xml file is generated at this location. Data feeds and content are generated in the subdirectory with the Source Name from this location.

Ensure that the SES feed location for a cluster is placed in a shared file location. If it is not in a shared file location, the SES Indexer will ignore the files.

Metadata List

(sceMetadataList)

A comma-delineated list of metadata values that are exported to Oracle SES. If the value is blank, the list of metadata values consists of the following fields:

  • dID

  • dDocName

  • dRevLabel

  • dDocType

  • dDocAccount

  • dSecurityGroup

  • dOriginalName

  • dReleaseDate

  • dOutDate

  • dDocCreator

  • dDocLastModifier

  • dDocCreatedDate

  • dDocFunction

  • fParentGUID

  • fApplicationGUID

  • all custom metadata fields (those beginning with the letter "x")

If this list is filled with a set of metadata fields, only those fields are exported to Oracle SES. These fields can be standard or custom metadata fields.

Admin Email(s)

(sceAdminEmail)

A comma-delineated list of email addresses, user names, and user aliases that are notified by email when crawling errors occur.

Custom Metadata Blacklist

(sceCustomMetadataBlacklist)

A comma-delineated list of metadata values that are not exported to Oracle SES. These fields can be standard or custom metadata fields.

Maximum Feeds Pending Consumption by SES per Source (sceMaxFeedsPerSource)

A number that limits the creation of new datafeeds if the datafeeds for each source that are pending consumption by SES exceeds the specified value.

To limit the feeds, this number must be set to 0 or a positive value. If this number is set to a negative value, there is no limit on the feeds generated.

Maximum Items Per Datafeed

(sceMaxItemsPerFeed)

The maximum number of content items for each data feed. (A content item in the feed is an operation. For example: insert, update, or delete a document.)

Core Filter

(sceCoreFilter)

Performs some pre-filtering on content to remove them from being exported to Oracle SES. Oracle recommends that you leave this value at the default setting.

Crawler Role

(sceCrawlerRole)

The Content Server role required for the account that Oracle SES uses to crawl the Content Server instance. By default, the Content Server admin role is required.

Caution: Do not use the default Oracle WebLogic Server administrator account to crawl from Oracle SES. Instead use either an administrator account from an external source (such as an LDAP provider) or the local Content Server account. If necessary, you can change the required role admin to another role, using this SESCrawlerExport field. For example:

  1. On the Content Server instance, create a new role called scecrawlerrole.

  2. Create a new local user account called sescrawler and assign the role scecrawlerrole to this user account.

  3. On Oracle SES, change your source definition to use the sescrawler account to crawl the Content Server instance.

  4. On the Content Server instance, add sceCrawlerRole=sescrawlerrole in the config.cfg file.

Source Name(s)

(sceSourceName)

A comma-delineated list of all content sources created on the WebCenter Content Serve instance. Each listed source is completely identical (mirrored). By having multiple sources, the content on this instance can be independently consumed by multiple Oracle SES servers.

These source names are used as the subdirectory names for the Feed Location directory to hold data feeds and contents.

Note: The name "ssSource" is a reserved source name and must not be used in this field.

Disable Secure APIs

(sceDisableSecureAPIs)

A Boolean flag that determines if the security for the services provided by the SESCrawlerExport component are done internally (false) or by the Content Server (true) natively. For more information on Single Sign-On, see "Configuring Content Server Source with Oracle Single Sign-On.

10.2.2.3.1 Configuring Content Server Source in Oracle SES

The Content Server connector enables Oracle SES to search the Content Server instance in WebCenter Content. The connector reads the feeds provided by the Content Server instance according to a crawling schedule. To crawl data from Oracle SES, you must create a source of type Content Server. For instructions on installing the connector patch and creating the Content Server source, see the Oracle Secure Enterprise Search Administrator's Guide.

The following parameters are used in setting up the Content Server source:

  • Configuration URL:

    http://host_name/instance_name/idcplg?IdcService=SES_CRAWLER_DOWNLOAD_CONFIG&source=source_name
    

    The parameter represented by source_name must be equal to one of the strings used in SESCrawlerExport component Source Name (sceSourceName) parameter. This parameter points to one of the content sources on the Content Server instance. For example:

    http://stahz16/ucm/idcplg?IdcService=SES_CRAWLER_DOWNLOAD_CONFIG&source=cs
    
  • HTTP endpoint for authentication and authorization: You are prompted for the HTTP endpoint values during the Oracle WebCenter Content identity plug-in activation and authorization manager configuration. The two values are usually the same on the same Content Server instance and are usually in the form of http://host_name/instance_name/idcplg. For example, http://host.example.com/ucm/idcplg. This value is used as the endpoint for any service call to Content Server instance. You can also find the value by choosing Administration, then Admin Server, then Internet Configuration. Use the current URL (without URL parameter) as the HTTP endpoint.

10.2.2.3.2 Configuring Content Server Source with Oracle Single Sign-On

When the Content Server instance is secured with Oracle Single Sign-On (OSSO), the SESCrawlerExport component configuration must be changed to allow Oracle SES access to the services provided by SESCrawlerExport. Go to the Configure SESCrawlerExport page to disable the internal security mechanisms by setting the Disable Secure APIs parameter to true.

10.2.2.3.3 Configuring Content Server Source with Oracle Access Manager

When the Content Server instance is secured with Oracle Access Manager (OAM), some changes must be made to allow Oracle SES access to the services provided by the SESCrawlerExport component.

  1. Open the config.cfg file for the Content Server instance in a text editor. For example: DomainHomeName/ucm/cs/config/config.cfg.

  2. Set the following property value:

    HttpServerAddress=<OHSHost>:7778
    

    where, OHSHost is the Oracle HTTP Server (OHS) host name and 7778 is the OHS port.

  3. Restart the Content Server instance. For instructions, see Restarting Content Server or Inbound Refinery Using Fusion Middleware Control.

  4. Choose Administration, then SESCrawlerExport.

  5. In the SES Crawler Export Administration page, click Configure SESCrawlerExport.

  6. In the Configure SESCrawlerExport page, click Host Name and set the host name to <OHSHost>.

  7. Configure Oracle SES to access the Content Server instance secured with OAM. See Oracle Secure Enterprise Search Administrator's Guide.

10.2.2.3.4 Configuring Content Server Source with Other Single Sign-On

When the Content Server instance is secured with a single sign-on solution other than Oracle Single Sign-On (OSSO), some changes must be made to allow Oracle SES access to the services provided by the SESCrawlerExport component.

  • Configuration: When using a single sign-on solution other than Oracle Single Sign-On, the security for the services provided by the SESCrawlerExport component are provided by the component itself. Go to the Configure SESCrawlerExport page to enable the internal SESCrawlerExport security mechanisms by setting the Disable Secure APIs parameter to false.

  • Web Server: Access to the services provided by the SESCrawlerExport component must bypass single sign-on because Oracle SES is not compatible with the single sign-on solutions. Depending on the selected single sign-on solution, creating a bypass might be as simple as configuring a web server module to allow access to a subset of services.

    If you set up an additional web server on the Content Server instance, the web server must run on a different port than the standard Content Server port (that is, something other than port 80). Configure this additional web server to not have any single sign-on protection at all. Also, set up Access Control Lists to allow only Oracle SES access to this web server. In the Oracle SES configuration, use this additional web server port in the configuration URLs for the Content Server source.

10.2.2.4 Configuring the Content Server Source Location Script

The Content Server source location script is a fully customizable Idoc script that evaluates against a content item's metadata and returns the source(s) to which this content item should be set.

To access the page where you can create or update the source location script:

  1. Choose Administration, then SESCrawlerExport.

  2. In the SES Crawler Export Administration page, click Configure SESCrawlerExport.

  3. In the Configure SESCrawlerExport page, click Configure Source Location Script.

  4. Enter the Idoc Script in the provided area.

    By default, the source location script is set to #all, which sends every content item flagged as Latest Released to all sources (see the Source Name parameter) configured on the Content Server instance. The #all source name is a reserved keyword that indicates that all sources receive the content item.

    Similarly, the #none source name is also a reserved keyword, but it indicates that the content item should be sent to no sources (basically, the content item is not exported to Oracle SES).

    Note:

    When using IdocScript to filter contents based on content metadata, ensure that the following fields are not used: dCreateDate, dReleaseDate, dDocLastModifiedDate, dOutDate, dInDate. This is because these fields are not formatted when processed by the script and will result in an error. Instead, use the following fields: dCreateDateStdFmt, dReleaseDateStdFmt, dDocLastModifiedDateStdFmt, dOutDateStdFmt, dInDateStdFmt.

  5. Click Update.

    If you want to remove the source location script, click Reset.

  6. To test the source location script, enter a content item's Document Name (dDocName) in the field provided, then click Test.

    If there are syntax errors in the script, the errors are either displayed on the page or in the server output, depending on the type of syntax error. Logic errors can be corrected on the SESCrawlerExport Source Location Script page and the test can be run again immediately.

    If the script returns a source name that does not exist, an error is generated in the server output. The invalid source name is removed and the item(s) continue to be processed, but it is recorded in the logs. You can correct this problem either by removing the source name from the script or by adding a new Source Name parameter value for your Content Server instance.

    You can return multiple source names in the script by separating them with commas.

Example 10-1 Example

In the following example, the source location script is set up to send all content items that have a Document Type (dDocType) of ADACCT into a source named accounting, and everything else falls into the source named default. The accounting and default sources must be set up separately by adding those names into the Source Name parameter on the Configure SESCrawlerExport page.

<$if dDocType like "ADACCT" $>
accounting
<$else$>
default
<$endif$>

10.3 Configuring Full-Text Database Search Index

To set up and use full-text database searching and indexing for SQL Server and other databases:

  1. Install WebCenter Content with the Content Server instance and configure it to work with the database.
  2. Add the following entry to the DomainHomeName\ucm\cs\config\config.cfg file and save the file:
    SearchIndexerEngineName=DATABASE.FULLTEXT
    
  3. Restart the Content Server instance. For instructions, see Restarting Content Server or Inbound Refinery Using Fusion Middleware Control.
  4. Rebuild the search index using the Repository Manager.

    See Starting the Repository Manager in Managing Oracle WebCenter Content.

Note:

If you have difficulty rebuilding the full-text database search index after importing the OCS schema, the message Unable to create Oracle text collection 'IdcText1' might be displayed. If this occurs, the solution is to log in as (Content Server) Database administrator and drop the tables IdcText1 and IdcText2.

See Recovering Oracle WebCenter Content in Administering Oracle Fusion Middleware.

10.4 Managing Elasticsearch

Let's learn about managing Elasticsearch with WebCenter Content. WebCenter Content communicates with Elasticsearch through REST APIs.

WebCenter Content supports a variety of search indexer engines including DATABASE.METADATA, DATABASE.FULLTEXT, and ORACLETEXTSEARCH. Out of these, ORACLETEXTSEARCH provides a rich searching capability including full-text searches with relevancy ranking, complex query structures, and improved performance compared to DATABASE.FULLTEXT. However, in a large enterprise setup where content items run into millions and ingestion is quite high, customers find rebuilding the ORACLETEXTSEARCH index to be time-consuming.

WebCenter Content communicates with Elasticsearch through REST APIs provided by Elasticsearch. WebCenter Content APIs/services exposed to users remain the same. While the APIs and user interfaces remain mostly untouched in Elasticsearch, rebuild time has reduced significantly. Users will also experience an improved and near real-time search response.

This section covers the following topics:

10.4.1 Elasticsearch Features and Benefits

Elasticsearch has features such as fast rebuild, full rebuild, reindex, sorting, facets, search operators, and searching.

This section covers the following topics:

10.4.1.1 How the Rebuild Feature Works in Elasticsearch?

Elasticsearch provides a new Rebuild option, Elasticsearch Reindex.

OracleTextSearch in WebCenter Content lets you perform Fast Rebuild or Full Rebuild (With extraction). So, now users can choose from Fast Rebuild, Full Rebuild (With extraction), and Elasticsearch Reindex (Full Rebuild from Elasticsearch).

With Elasticsearch, the Indexer Rebuild dialog has two check boxes: Use fast rebuild and Full rebuild with content extraction. You can access this dialog box through Repository Manager by selecting Indexer, then Collection Rebuild Cycle, and then Start.

10.4.1.2 Fast Rebuild

The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild.

A Fast Rebuild is required when adding or removing searchable fields. You can open the Collection Rebuild Cycle window and select the Use fast rebuild checkbox and click OK to do the fast rebuild.

10.4.1.3 Full Rebuild

The Full Rebuild option rebuilds the search index.

It extracts content and pushes it to the new index in the OpenSearch server using the metadata. This is a time consuming task, and therefore, use with extreme caution.

You can open the Collection Rebuild Cycle window and select the Full rebuild with content extraction check box and click OK to do the full rebuild.

10.4.1.4 Elasticserver ReIndex

The Elasticserver ReIndex option uses the Elasticsearch API to reindex an existing collection to a new collection.

For reindexing, it reuses already extracted content and metadata available in the active collection. Since this option doesn’t need to extract content, it’s a faster alternative to Full Rebuild.

You can open the Collection Rebuild Cycle window and do not select any of the options. Click OK to do the Elasticsearch ReIndex.

There is an alternate option to do indexing. With this option, you can perform Elasticsearch ReIndex instead of Full rebuild with extraction. To invoke Elasticsearch ReIndex, select Administration, then Admin Actions, then Collection Rebuild Cycle (section), and then Start. In the current version, Indexer Counters are not implemented for Elasticsearch ReIndex. Also, note that the Cancel and Suspend buttons might not work.

10.4.1.5 Sorting

Elasticsearch can accept any existing searchable field as SortField, so in the search result searchable fields can be sorted.

You don’t have to rebuild if you make a field sortable or not-sortable. Changing sortability of a field is required only for sorting results on the user interface. Even if you don’t make a field sortable from Configuration Manager, if the field is passed as SortField, Elasticsearch sorts the search results by that field.

10.4.1.6 Facets

With WebCenter Content Elasticsearch, the default number of drilldown value is 50.

It is configurable via MaxElasticSearchDrillDownValues in configuration or can be passed in the binder. MaxElasticSearchDrillDownValues can be any positive integer.

10.4.1.7 Search Operators and Searching

The Search user interface now includes more search operators. The default search operators are: Contains, Matches, Has Word Prefix, Starts, Ends, Substring, and Not Matches.

Searching
  • All search features supported with OracleTextSearch are supported with Elasticsearch as well.
  • Elasticsearch does not have Optimized and Zone fields.
  • With Elasticsearch, metadata field names are expected to be case-sensitive during search, but the QueryText is case-insensitive.
  • Queries using the MATCHES operator matches for the case-insensitive exact match of the query text on all searchable fields.
  • Elasticsearch does not throw any error if a non-existing field or metadata is searched for. Instead, it shows zero results.
  • With Elasticsearch, WebCenter Content gives valid results without ignoring any special characters.
  • In the search performed from WebCenter Content user interface, WebCenter Content trims the trailing spaces and then the trimmed value is used as query text. In WebCenter Content user interface, spaces at the end and/or at the start of the query text lead to different results compared to OracleTextSearch. In case of RIDC, Elasticsearch returns search results considering trailing spaces also.
  • Text within HTML tags such as <script>..</script>, <style>..</style>, <! -- --> would not be tokenized and hence not searchable.
  • WebCenter Content does not allow searching on non-existent or non-searchable fields. It would throw an error message "<fieldname> is not a searchable field".

Searching Stop Words

The stop words are commonly used words that are excluded from searches to help index and parse web pages faster. For the stop words, Elasticsearch does not create an index entry.

  • This list is derived from the OTS stop words.

    "Mr","Mrs","Ms","a","all","almost","also","although","an","and","any","are","as","at","be","because","been","both","but","by","can","could","d","did","do","does","either","for","from","had","has","have","having","he","her","here","hers","him","his","how","however","i","if","in","into","is","it","its","just","ll","me","might","my","no","non","nor","not","of","on","one","only","onto","or","our","ours","s","shall","she","should","since","so","some","still","such","t","than","that","the","their","them","then","there","therefore","these","they","this","those","though","through","thus","to","too","until","ve","very","was","we","were","what","when","where","whether","which","while","who","whose","why","will","with","would","yet","you","your","yours".

  • When you are searching with a stop word, Elasticsearch treats you as if you are searching with an empty string instead of that word.
  • The stop words are applicable only on search queries that are Full-Text Search, Quick Search, Contains, Has Word Prefix.
  • A query (Full-Text Search, Quick Search, and Contains) composed of a stop word or a phrase composed of only stop words would return all results as if it is an empty search. For example, a query on the word this returns all hits as this is defined as a stop word.
  • A query (Has Word Prefix) composed of a stop word or a phrase composed of only stop words would return no results. For example, a query on the word this returns all hits as this is defined as a stop word.
  • You can query on phrases that contain stop words as well as non-stop words. In such cases, the phrase is searched as if the stop word in the phrase does not exist. For example, a query on phrase this title returns hit as if you are only searching the word title as this is a stop word.
10.4.1.8 Stemming

Stemming is applicable only on text queries: Contains, Has Word Prefix, Full Text Search, and QuickSearch.

Stemming words differ from OracleTextSearch to Elasticsearch because internally the search engines use different dictionaries. For example, in OracleTextSearch, a search query for the word “find” returns found, finds, finding and for the word “make”, the query returns make, made, makes, making. In Elasticsearch, the search result for “find” shows find, finds, finding and for “make” the result shows make, makes, making. “Found” and “made” are not shown in Elasticsearch results, but they do in OracleTextSearch.

10.4.1.9 Snippets

You can enable the Snippets feature with Elasticsearch by setting the following configuration entry in the config.cfg file: ElasticSearchDisableSearchSnippet=false.

Keep in mind that this feature can affect search query performance. Snippets displayed with Elasticsearch are different from those that are displayed with OracleTextSearch. Look-and-feel of snippets in Elasticsearch is different from the look-and-feel of OracleTextSearch snippets. With Elasticsearch one complete sentence is equal to one snippet.

In an Elasticsearch result, if the document is resulted in search because of only metadata match but not from the extracted content of the document, only that metadata value is shown as snippet.

10.4.1.10 Highlighting

Elasticsearch highlights the search keywords but does not give pointers to the previous and next match.

OracleTextSearch highlights the search keywords along with pointers to next and previous match.

Elasticsearch highlights returns the extracted content of a document only when there is a match in the extracted content. Highlighting shows metadata of the document only if there is any match with that particular metadata value.

If the match is limited to metadata of the document, only the matched metadata fields are listed but not the extracted content.

10.4.2 Configuring Elasticsearch

In this section, you'll learn how you can configure Elasticsearch for WebCenter Content. Before configuring Elasticsearch for WebCenter Content, you'll need to secure nodes of the cluster, secure Elasticsearch, and start the Base node first and then other nodes.

To configure Elasticsearch for WebCenter Content, follow these steps:

Note:

Indices in Elasticsearch are stored as files on disk. For Elasticsearch to work, it requires large amount of free disk space. For more information, contact Oracle support.
  1. Download and unzip 7.6 or newer 7.x versions of Elasticsearch from https://www.elastic.co/downloads/past-releases#elasticsearch.
  2. Navigate to <IdcHomeDir>/components/ElasticSearch/scripts.

    Note:

    WebCenter Content provides a script SecureES.sh or SecureES.cmd that automates the steps to secure the Elasticsearch nodes (one or more) of an Elasticsearch cluster. It is assumed that Elasticsearch cluster is installed on all the nodes of the cluster. It can be a single node cluster also. If it is multi-node cluster, it should have at least 3 master-eligible nodes.
  3. Run script on all the nodes of the cluster. Before running a node, it should be secured first. Base node should be started first and then other nodes.
To download an Elasticsearch client JAR, follow these steps:
  1. Go to https://repo1.maven.org/maven2/org/elasticsearch/client/elasticsearch-rest-client/ and browse for the relevant version.
  2. Download the required version of the JAR file in <IdcHomeDir>/components/ElasticSearch/lib/.

This section covers the following topics:

10.4.2.1 Updating ESnode.properties

The ESnode.properties file needs to be updated before setting up all the Elasticsearch nodes that would be secured as part of the initial cluster setup.

Update configuration for all the nodes that are going to be part of the setup before securing them. The ESnode.properties file should be present in the same folder where script file is residing. Follow these steps:
  • Configure individual nodes: Configure all the nodes that are planned for the initial cluster setup. Provide the entries (node1, node2, node3, ......, node{n}) as the number of nodes being created as part of the setup.
    node{n}_ES_HOME         
    node{n}_node_name
    node{n}_http_port 
    Where {n} is the nth node in the setup. For example:
    ##Node1 (BASE NODE)
    node1_ES_HOME=/ESuser/elasticsearch-7.6.0_1
    node1_node_name=nodeA
    node1_http_port=9201
    ##Node2
    node2_ES_HOME=/ESuser/elasticsearch-7.6.0_2
    node2_node_name=nodeB
    node2_http_port=9202
  • Common configuration for all nodes:
    • BASE_ES_HOME: This should be same as node1_ES_HOME or where config/{certificate_name} and config/elasticsearch.keystore are accessible to all nodes. For example, BASE_ES_HOME=/ESuser/elasticsearch-7.6.0_1.
    • cluster_name: Name of the cluster. For example, cluster_name=wcc-elasticsearch.
    • certificate_name: Certificate name (extension must be .p12) for which cluster will be secured. For example, certificate_name=elastic-certificates.p12.
    • wcc_es_admin_user: User with which WebCenter Content will communicate with Elasticsearch. For example, wcc_es_admin_user=wccesadmin.
    • cluster_initial_master_nodes: All node names that are part of the initial cluster setup. For example, cluster_initial_master_nodes=["nodeA","nodeB","nodeC",…,”node{N}”].
    • discovery_seed_hosts: All hostnames where these nodes are going to be configured. This is mandatory only if Elasticsearch cluster is horizontal. For example, discovery_seed_hosts=["host1.example.com","host2.example.com","host3.example.com",…,” host{n}.example.com”]
    • WINDOWS_CURL_HOME: It is required for windows and only for base node (node1). For example, C:\curl-7.72.0_5-win64-mingw\bin\curl.exe where WINDOWS_CURL_HOME = C:\curl-7.72.0_5-win64-mingw.
10.4.2.2 Using SecureES.sh on Unix

The script automates the steps to secure Elasticsearch cluster nodes on Unix.

Usage:

For help:

./SecureES.sh -h or --help
To run script:
./SecureES.sh -n <nodenumber> 

For example, if you have 3 nodes to secure, it is mandatory to run the script on the first node and then other nodes.

./SecureES.sh -n 1 
./SecureES.sh -n 2 
./SecureES.sh -n 3 
10.4.2.3 Using SecureES.cmd on Windows

The script automates the steps to secure Elasticsearch cluster nodes on Windows.

Usage:

To run the script:

SecureES.cmd -n <nodenumber> 
For example, if you have 3 nodes to secure, it is mandatory to run the script on the first node and then other nodes.
SecureES.cmd -n 1   
SecureES.cmd -n 2
SecureES.cmd -n 3
10.4.2.4 Securing Elasticsearch
Follow these steps to secure First (Base) Node:
  1. Navigate to <ELASTIC_COMPONENT_DIR>/scripts.
  2. Run the script. For windows, run SecureES.cmd -n <nodenumber> and for Unix, run ./SecureES.sh -n <nodenumber>.
  3. You will be asked to enter the name of the certificate. If you don't enter, it will take the default name elastic-certificates.p12. Certificate should have the extension p12.
    Certificate password

    Give a password for the certificate.

    Enter password
  4. Add the certificate password to the keystore. If a elasticsearch keystore is not present, it will ask you to create one. Press y to create the keystore and proceed. Note that choosing N here will not secure the node.
    Add certificate password to Keystore

    You will be asked to enter the password 4 times. Enter the above used certificate password.

    Enter password 4 times
  5. Set up the password for the reserved user, elastic. Enter a password for the user elastic. This will be used in later step to create a user to communicate with WebCenter Content.
    Enter boostrap password
  6. Create a user to communicate with the WebCenter Content. You will be asked to enter a user name and password. Enter the name or press ENTER to use the default name wccesadmin.
    Enter password

    Enter the password set to the user elastic.

    Password for the user elastic
  7. Once the setup is done, you will see the setup complete message.
  8. Do not start the node now.
10.4.2.5 Securing Other Nodes of Cluster

You need to run the script to secure the nodes of a cluster.

Follow these steps to secure other nodes of cluster:
  1. Navigate to <ELASTIC_COMPONENT_DIR>/scripts.
  2. Run the script. The cluster name should be same for all the nodes. The node names should be unique.

    For Unix:

    ./SecureES.sh -n 2

    For Windows:

    SecureES.cmd -n 2
  3. Once the setup is done, you will see the setup complete message.
  4. Do not start the node now.
10.4.2.6 Start Elasticsearch Cluster

After securing or configuring all the nodes of the cluster, you can start all the nodes.

After securing all the nodes, go to <ES_HOME>/bin of each node and run
./elasticsearch

Start the base node (node1) first and then start other nodes.

You should start the BASE NODE (node1) first and then start other nodes.

After nodes are started, you can access each node with wccesadmin.
https://<hostname>:<nodeport>
10.4.2.7 Configuring Elasticsearch for WebCenter Content

Before you configure Elasticsearch for WebCenter Content, you need to do the mandatory initial configuration settings along with enabling the Elasticsearch search indexer.

To configure Elasticsearch for WebCenter Content, follow these steps:
  1. Start the WebCenter Content managed server.
  2. Select Adminstration, then Elasticsearch, and then Elasticsearch Configuration.
  3. In the Elasticsearch Configuration page, enter the values for the following fields as shown in the figure below:
    • Elasticsearch Nodes to connect - comma-separated list of Elasticsearch nodes of a cluster
    • Username - user name to connect to Elasticsearch
    • Password - user password
    • Certificate Path - absolute path of the certificate using the cluster which is secured
    • Password - cerificate password
    Elasticsearch configuration details
10.4.2.8 Monitoring Elasticsearch Cluster Health

For WebCenter Content to function properly, it is important to have a good Elasticsearch cluster health.

This feature is introduced to monitor Elasticsearch health at an interval of 1 hour. If the status of the Elasticsearch health issue is Red or connection is down, then an alert will be added and monitored every minute until Elasticsearch health status turns Green or Yellow. Once the status of the Elasticsearch health turns Green or Yellow, health alert will be removed automatically and continue to monitor every hour thereafter.

The figure below is showing Elasticsearch connection is down temporarily.
Red alert showing connection is down
10.4.2.9 Configuring Index Settings

You can configure shards and replicas for different indexes as per the required data.

This new feature allows to customize shards and replica counts for each Elasticsearch index. As per Elasticsearch design, each index in Elasticsearch would be mapped to a security group in WebCenter Content. The indexes will be created during:
  • server startup
  • new Security Group is added to the system
  • collection rebuild or reindex
  • migration from other search engines to Elasticsearch

Shards and replicas will be allotted to the indexes when they are created in the system based on the user configuration. Any updates to these settings will be reflected only after next Full Rebuild or Reindex cycle. You can set limit on the shards and replicas counts.

Shards count: It should be an integer value ranging from 5 to 300. The default value is 5.

Replicas count: It should be either 1 or 2. The default value is 1.

If connection with Elasticsearch is not established and no indexes are created yet, an additional optional alert will appear along with the existing Elasticsearch alerts.
alert message

On clicking this alert message, you will be redirected to ElasticSearch Index Settings page where you can customize shards and replicas for each security group (index) existing in the system.

Indexes with these customized settings will be created when successful connection with Elasticsearch is established. In case of migration to Elasticsearch from other search engines, migration needs to be successful for these indexes to get created with the customized settings.

For already configured Elasticsearch instances, the indexes are created with the default index settings.

To configure index settings:
  1. Select Administration, then ElasticSearch, and then ElasticSearch Index Settings.
  2. In the Configure Index Settings page, you (admin) can configure indexes with desired shards and replicas count. The updated shard and replica settings will be reflected after:
    • next Full Rebuild or Reindex cycle
    • establishing successful connection in a fresh instance
    • migration if you are switching over from a different search engine
    You can not update specific indexes. Once the Update button is clicked, all the records will be updated.
    Configure Index Settings page
  3. To view all the active indexes and their shard and replica settings retrieved from the Elasticsearch server, select the Active Index Settings tab.
    View Active Index Settings

Adding New Security Group

If a new security group is added after successful connection to the Elasticsearch server from WebCenter Content, its corresponding index will be created in Elasticsearch with default shard (5) and replica (1) counts.

If you want to customize its settings, you can do it from the ElasticSearch Index Settings page, but they will be reflected only after next rebuild or reindex cycle.

10.4.3 Migrating Existing Search Indexes to Elasticsearch Server

When you migrate from the active search index to the Elastic server, the active index is changed to es1.

Note:

During the migration of 5 million records from OTS to Elasticsearch, for every text field, you need to create 4 types of mappings for various search operations in Elasticsearch. Elasticsearch considers these mappings as different fields. For example, A text field dDocTitle will have dDocTitle, dDocTitle.normalize, dDocTitle.keyword, dDocTitle.stem, and they are considered as 4 fields, not one field. So, if you have 250 text fields, Elasticsearch will consider them as 250*4 = 1000 fields. For metadata other than text fields, there is only one mapping. After deleting unwanted metadata fields, you will be able to perform the migration activity.

If an existing WebCenter Content instance is configured to use the ORACLETEXTSEARCH (OTS) search engine, then the active index ots1/ots2 will be used to fetch the already extracted content. A successful migration activity will change the active search index to the Elastic server, es1.

Select Administration and then Configuration for <hostname with port> page is displayed. It will display ots1/ots2 as an active index as shown below:

Active Index ots1

To migrate, select Administration and then ElasticSearch. The ElasticSearch Migration page is displayed. Select the appropriate search engine from the Search Engine to Migrate drop-down menu as shown below:

ElasticSearch Migration page

Migration Batch Size determines the number of documents batched together to push to the Elasticsearch server. We need to carefully choose the batch size, as in case of the full-text search engines like ORACLETEXTSEARCH, the batch will also include the text-extracted content of the documents along with its metadata.

Migrate Metadata Only indicates whether we need to push the text-extracted content to the Elasticsearch server. In case of the full-text search engines like ORACLETEXTSEARCH, this should be always set to False. It means the text-extracted content is also pushed to the Elasticsearch server.

Upon starting a migration activity, a table of all recent migration jobs and its status details will be listed as shown below:

Migration jobs

You can pause or resume an on-going migration activity and can retry the latest failed migration activity, if any. A completed migration activity details are shown below:

Completed status

A successful migration activity will switch active index to es1 as shown below:

Migrated Active Index to es1

Note:

A successful migration activity will remove the migration alert banner.

10.5 Managing OpenSearch

Let's learn about managing OpenSearch with WebCenter Content.

Oracle Cloud Infrastructure (OCI) Search Service with OpenSearch is an insight engine offered as an Oracle-managed service. Without any downtime, Oracle automates patching, updating, upgrading, backing up, and resizing the service. You can store, search, and analyze large volumes of data quickly and see results in near real-time.

WebCenter Content communicates with OpenSearch through REST APIs. WebCenter Content APIs or services exposed to the users remain the same.

This section covers the following topics:

10.5.1 OpenSearch Features and Benefits

OpenSearch has features such as fast rebuild, full rebuild, reindex, sorting, facets, search operators, and searching.

This section covers the following topics:

10.5.1.1 How the Rebuild Feature Works in OpenSearch?

OpenSearch provides a new Rebuild option, OpenSearch Reindex.

OpenSearch in WebCenter Content lets you perform Fast Rebuild or Full Rebuild (With extraction). So, now users can choose from Fast Rebuild, Full Rebuild (With extraction), and OpenSearch Reindex (Full Rebuild from Elasticsearch).

With OpenSearch, the Indexer Rebuild dialog has two check boxes: Use fast rebuild and Full rebuild with content extraction. You can access this dialog box through Repository Manager by selecting Indexer, then Collection Rebuild Cycle, and then Start.

10.5.1.2 Fast Rebuild

The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild.

A Fast Rebuild is required when adding or removing searchable fields. You can open the Collection Rebuild Cycle window and select the Use fast rebuild checkbox and click OK to do the fast rebuild.

10.5.1.3 Full Rebuild

The Full Rebuild option rebuilds the search index.

It extracts content and pushes it to the new index in the OpenSearch server using the metadata. This is a time consuming task, and therefore, use with extreme caution.

You can open the Collection Rebuild Cycle window and select the Full rebuild with content extraction check box and click OK to do the full rebuild.

10.5.1.4 OpenSearch ReIndex

The OpenSearch ReIndex option uses the OpenSearch API to reindex an existing collection to a new collection.

For reindexing, it reuses already extracted content and metadata available in the active collection. Since this option doesn’t need to extract content, it’s a faster alternative to Full Rebuild.

You can open the Collection Rebuild Cycle window and do not select any of the options. Click OK to do the OpenSearch ReIndex.

There is an alternate option to do indexing. With this option, you can perform OpenSearch ReIndex instead of Full rebuild with extraction. To invoke OpenSearch ReIndex, select Administration, then Admin Actions, then Collection Rebuild Cycle (section), and then Start. In the current version, Indexer Counters are not implemented for OpenSearch ReIndex. Also, note that the Cancel and Suspend buttons might not work.

10.5.1.5 Sorting

OpenSearch can accept any existing searchable field as SortField, so in the search result searchable fields can be sorted.

You don’t have to rebuild if you make a field sortable or not-sortable. Changing sortability of a field is required only for sorting results on the user interface. Even if you don’t make a field sortable from Configuration Manager, if the field is passed as SortField, OpenSearch sorts the search results by that field.

10.5.1.6 Facets

With WebCenter Content OpenSearch, the default number of drilldown value is 50.

It is configurable via MaxOpenSearchDrillDownValues in configuration or can be passed in the binder. MaxOpenSearchDrillDownValues can be any positive integer.

10.5.1.7 Search Operators and Searching

The Search user interface now includes more search operators. The default search operators are: Contains, Matches, Has Word Prefix, Starts, Ends, Substring, and Not Matches.

Searching
  • All search features supported with OracleTextSearch are supported with OpenSearch as well.
  • OpenSearch does not have Optimized and Zone fields.
  • With OpenSearch, metadata field names are expected to be case-sensitive during search, but the QueryText is case-insensitive.
  • Queries using the MATCHES operator matches for the case-insensitive exact match of the query text on all searchable fields.
  • OpenSearch does not throw any error if a non-existing field or metadata is searched for. Instead, it shows zero results.
  • With OpenSearch, WebCenter Content gives valid results without ignoring any special characters.
  • In the search performed from WebCenter Content user interface, WebCenter Content trims the trailing spaces and then the trimmed value is used as query text. In WebCenter Content user interface, spaces at the end and/or at the start of the query text lead to different results compared to OracleTextSearch. In case of RIDC, OpenSearch returns search results considering trailing spaces also.
  • Text within HTML tags such as <script>..</script>, <style>..</style>, <! -- --> would not be tokenized and hence not searchable.
  • OpenSearch does not allow searching on non-existent or non-searchable fields. It would throw an error message "<fieldname> is not a searchable field".

Searching Stop Words

The stop words are commonly used words that are excluded from searches to help index and parse web pages faster. For the stop words, OpenSearch does not create an index entry.

  • This list is derived from the OTS stop words.

    "Mr","Mrs","Ms","a","all","almost","also","although","an","and","any","are","as","at","be","because","been","both","but","by","can","could","d","did","do","does","either","for","from","had","has","have","having","he","her","here","hers","him","his","how","however","i","if","in","into","is","it","its","just","ll","me","might","my","no","non","nor","not","of","on","one","only","onto","or","our","ours","s","shall","she","should","since","so","some","still","such","t","than","that","the","their","them","then","there","therefore","these","they","this","those","though","through","thus","to","too","until","ve","very","was","we","were","what","when","where","whether","which","while","who","whose","why","will","with","would","yet","you","your","yours".

  • When you are searching with a stop word, OpenSearch treats you as if you are searching with an empty string instead of that word.
  • The stop words are applicable only on search queries that are Full-Text Search, Quick Search, Contains, Has Word Prefix.
  • A query (Full-Text Search, Quick Search, and Contains) composed of a stop word or a phrase composed of only stop words would return all results as if it is an empty search. For example, a query on the word this returns all hits as this is defined as a stop word.
  • A query (Has Word Prefix) composed of a stop word or a phrase composed of only stop words would return no results. For example, a query on the word this returns all hits as this is defined as a stop word.
  • You can query on phrases that contain stop words as well as non-stop words. In such cases, the phrase is searched as if the stop word in the phrase does not exist. For example, a query on phrase this title returns hit as if you are only searching the word title as this is a stop word.
10.5.1.8 Stemming

Stemming is applicable only on text queries: Contains, Has Word Prefix, Full Text Search, and QuickSearch.

Stemming words differ from OracleTextSearch to OpenSearch because internally the search engines use different dictionaries. For example, in OracleTextSearch, a search query for the word “find” returns found, finds, finding and for the word “make”, the query returns make, made, makes, making. In OpenSearch, the search result for “find” shows find, finds, finding and for “make” the result shows make, makes, making. “Found” and “made” are not shown in OpenSearch results, but they do in OracleTextSearch.

10.5.1.9 Snippets

You can enable the Snippets feature with OpenSearch by setting the following configuration entry in the config.cfg file: OpenSearchDisableSearchSnippet=false.

Keep in mind that this feature can affect search query performance. Snippets displayed with OpenSearch are different from those that are displayed with OracleTextSearch. Look-and-feel of snippets in OpenSearch is different from the look-and-feel of OracleTextSearch snippets. With OpenSearch one complete sentence is equal to one snippet.

In an OpenSearch result, if the document is resulted in search because of only metadata match but not from the extracted content of the document, only that metadata value is shown as snippet.

10.5.1.10 Highlighting

OpenSearch highlights the search keywords but does not give pointers to the previous and next match.

OracleTextSearch highlights the search keywords along with pointers to next and previous match.

OpenSearch highlights returns the extracted content of a document only when there is a match in the extracted content. Highlighting shows metadata of the document only if there is any match with that particular metadata value.

If the match is limited to metadata of the document, only the matched metadata fields are listed but not the extracted content.

10.5.2 Configuring OpenSearch

In this section, you'll learn how to configure OpenSearch for WebCenter Content, monitor cluster health, and configure index settings.

The WebCenter Content connects to an existing OCI OpenSearch cluster.

This section covers the following topics:

10.5.2.1 Configuring OpenSearch for WebCenter Content

Before you configure OpenSearch for WebCenter Content, you need to do the mandatory initial configuration settings along with enabling the OpenSearch search indexer.

The initial configuration settings are shown in the figure below:


Search option

If the above step is not done, stop the WebCenter Content managed server and set the below parameter in the config.cfg file:
SearchIndexerEngineName=OPENSEARCH

Now, start the WebCenter Content managed server.

To configure OpenSearch for WebCenter Content, follow these steps:
  1. Start the WebCenter Content managed server.
  2. Select Adminstration, then OpenSearch, and then OpenSearch Configuration.
  3. In the OpenSearch Configuration page, enter the values for the following fields as shown in the figure below:
    • OpenSearch Cluster - comma-separated list of OpenSearch nodes of a cluster
    • OpenSearch Certificate Type to connect - certificate type to connect to OpenSearch
    • Root Certificate Path - absolute path of the root certificate
    • Authorization - method to communicate with OpenSearch


    OpenSearch configuration details

10.5.2.2 Configuring OpenSearch for WebCenter Content with OCI
To configure OpenSearch for WebCenter Content with OCI, follow these steps:
  1. For WebCenter Content instance, open a shell logged in as the user that owns WebCenter Content domain files and directories (typically user oracle).
  2. Change the directory to <WCC domain path>.
  3. To get the OpenSearch certificate, in a shell of WebCenter Content instance, run the following command:
    openssl s_client -showcerts -connect <OpenSearch private IP>:9200 </dev/null
    | sed -n -e '/-.BEGIN/,/-.END/ p' > cert.pem
  4. To test the connection from WebCenter Content instance to the OpenSearch cluster:
    /usr/bin/curl https:<OpenSearch private IP>:9200 --insecure
    This is merely a simple test to see if WebCenter Content instance can reach the OS cluster. If successful, it will return the following:
    [oracle@wcctestinstance ~]$ /usr/bin/curl https://<OpenSearch private
    IP>:9200
    {
    "name" : "opensearch-master-2",
    "cluster_name" : "opensearch",
    "cluster_uuid" : "kN6M0YIeSxWTBFFu0zQH1g",
    "version" : {
    "distribution" : "opensearch",
    "number" : "1.2.4",
    "build_type" : "tar",
    "build_hash" : "unknown",
    "build_date" : "2022-10-19T18:30:04.947648Z",
    "build_snapshot" : false,
    "lucene_version" : "8.10.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
    },
    "tagline" : "The OpenSearch Project: https://opensearch.org/"
    }
  5. In the shell, change the directory to <WCC domain path>/ucm/cs/config. If this is a clustered WebCenter Content, the config.cfg file will be located under the file share used by the WebCenter Content.
  6. Edit the config.cfg file. Add the following entry:
    SearchIndexerEngineName=OPENSEARCH

    If SearchIndexerEngineName is set to OracleTextSearch or DATABASE.METADATA, either delete or comment out those lines.

  7. Save and exit the file.
  8. Restart the WebCenter Content managed server(s).
  9. Open the WebCenter Content page.
  10. Select Administration, then OpenSearch, and then OpenSearch Configuration.
  11. In the OpenSearch Configuration page, enter the values for the fields as explained in Configuring OpenSearch for WebCenter Content. Click the Update button.
If the WebCenter Content connects to OpenSearch, it will show the following status:
  • Green: OpenSearch was configured for three master and data nodes.
  • Yellow: OpenSearch was configured for single node cluster. This is due to the single node not being able to distribute its replicate shards. It can be ignored, it won't affect indexing and searches.

The initial configuration for OpenSearch doesn't require an initial collection rebuild. Once the parameters in the OpenSearch Configuration page are completed and the WebCenter Content is connected to OpenSearch, a collection rebuild isn't required.

As part of the configuration, the OpenSearch indices (based on WebCenter Content security groups) will be created. Items can be checked in and searched for. If items were checked in before, they also are searchable.

Note:

If a new metadata field is to be created or if fields from another WebCenter Content instance will be migrated using CMU, after the creation or CMU import, immediately run the Fast Rebuild.
Until the Fast Rebuild is run:
  • Do not check in new content with the new field value populated.
  • Do not archive import content that have the field value populated.
The Fast Rebuild will take a very long time to complete if it has to index field values in fields that didn't already have that field in the index. For more details, see ElasticSearch Fast Rebuild Takes a Long Time to Complete.
10.5.2.3 Monitoring OpenSearch Cluster Health

For WebCenter Content to function properly, it is important to have a good OpenSearch cluster health.

This feature is introduced to monitor OpenSearch health at an interval of 1 hour. If the status of the OpenSearch health issue is Red or connection is down, then an alert will be added and monitored every minute until OpenSearch health status turns Green or Yellow. Once the status of the OpenSearch health turns Green or Yellow, health alert will be removed automatically and continue to monitor every hour thereafter.

10.5.2.4 Configuring Index Settings

You can configure shards and replicas for different indexes as per the required data.

This new feature allows to customize shards and replica counts for each OpenSearch index. As per OpenSearch design, each index in OpenSearch would be mapped to a security group in WebCenter Content. The indexes will be created during:
  • server startup
  • new Security Group is added to the system
  • collection rebuild or reindex
  • migration from other search engines to OpenSearch

Shards and replicas will be allotted to the indexes when they are created in the system based on the user configuration. Any updates to these settings will be reflected only after next Full Rebuild or Reindex cycle. You can set limit on the shards and replicas counts.

Shards count: It should be an integer value ranging from 5 to 300. The default value is 5.

Replicas count: It should be either 1 or 2. The default value is 1.

If connection with OpenSearch is not established and no indexes are created yet, an additional optional alert will appear along with the existing OpenSearch alerts.

On clicking the alert message, you will be redirected to OpenSearch Index Settings page where you can customize shards and replicas for each security group (index) existing in the system.

Indexes with these customized settings will be created when successful connection with OpenSearch is established. In case of migration to OpenSearch from other search engines, migration needs to be successful for these indexes to get created with the customized settings.

For already configured OpenSearch instances, the indexes are created with the default index settings.

To configure index settings:
  1. Select Administration, then OpenSearch, and then OpenSearch Index Settings.
  2. In the Configure Index Settings page, you (admin) can configure indexes with desired shards and replicas count. The updated shard and replica settings will be reflected after:
    • next Full Rebuild or Reindex cycle
    • establishing successful connection in a fresh instance
    • migration if you are switching over from a different search engine
    You can not update specific indexes. Once the Update button is clicked, all the records will be updated.

    Configure Index Settings page

  3. To view all the active indexes and their shard and replica settings retrieved from the OpenSearch server, select the Active Index Settings tab.

    View Active Index Settings

Adding New Security Group

If a new security group is added after successful connection to the OpenSearch server from WebCenter Content, its corresponding index will be created in OpenSearch with default shard (5) and replica (1) counts.

If you want to customize its settings, you can do it from the OpenSearch Index Settings page, but they will be reflected only after next rebuild or reindex cycle.

10.5.3 Migrating Existing Search Indexes to OpenSearch

If the WebCenter Content server was previously configured with other search engines (like OTS, FULLTEXT, Elasticsearch) and now the search engine has changed to OpenSearch, content should be migrated.

To migrate, select Administration, then OpenSearch, and then OpenSearch Migration. The figure below is showing the migration from Elastisearch to OpenSearch. While migrating from Elastisearch to OpenSearch, only the METADATA option is available in the Search Engine to Migrate drop-down menu.


OpenSearch migration

Migration Batch Size determines the number of documents included as a batch together to be pushed to the OpenSearch server. We need to carefully choose the batch size, the batch will also include the text-extracted content of the documents along with its metadata.

Migrate Metadata Only indicates whether we need to push the text-extracted content to the OpenSearch server. In case of the full-text search engines like OpenSearch, this should be always set to False. It means the text-extracted content is also pushed to the OpenSearch server.

Upon starting a migration activity, a table of all recent migration jobs and its status details will be listed.