Managing Search Features
This chapter describes how to configure the OracleTextSearch feature to use Oracle Text as the primary full-text search engine for Oracle WebCenter Content and how to configure full-text database searching.
This chapter also describes how to configure Elasticsearch which has reduced rebuild time significantly and how to configure OpenSearch.
This chapter covers the following topics:
Managing OracleTextSearch
If you have a license to use the OracleTextSearch feature with Oracle Database, then you can configure OracleTextSearch to use the Oracle Text product as the primary full-text search engine for WebCenter Content. Oracle Text offers state-of-the-art indexing capabilities. Oracle Text has its own query syntax, which is intended more for use by applications or information professionals rather than casual end-users.
OracleTextSearch enables administrators to specify certain metadata fields to be optimized for the search index and to customize additional fields. This feature also enables a fast index rebuild and index optimization.
This section covers the following topics:
Considerations for Using OracleTextSearch
The following items are important when considering use of the OracleTextSearch feature:
- 
    WebCenter Content version 12c supports all languages supported by Oracle Text. OracleTextSearch can filter and extract content from different document formats in different languages. It supports a large number of document formats, including Microsoft Office file formats, Adobe PDF, HTML, and XML. It also supports archive and compression file formats such as zip, zipx, and gz. It can render search results in various formats, including unformatted text, HTML with term highlighting, and original document format. 
- 
    Oracle Text runs on Oracle Database 12c. The Content Server database can be Oracle Database 12c, Microsoft SQL Server, or other databases as listed in the Oracle WebCenter Content 12c Certification Matrix. However, if the system database is not Oracle Database 12c, then an external provider for OracleTextSearch must be configured. For details on external providers, see Configuring OracleTextSearch for Content Server. 
- 
    When using OracleTextSearch, Oracle Database version 11.1.0.7.0 or higher is required. 
- 
    Optimized fields for OracleTextSearch are created as SDATA fields, which have a maximum limit of 249 characters. This limit is imposed by Oracle Database and is reflected in Content Server by the OracleTextSearch component. Default SDATA fields include dDocName, dDocTitle, dDocType, and dSecurityGroup. The total number of SDATA fields is limited to 32 fields. 
- 
    While WebCenter Content provides numerous search options using a variety of databases (Oracle, Microsoft SQL Server, IBM DB2), by default the database that serves as the search index is the same system database used by WebCenter Content to manage metadata and other configuration information (users, security groups, and so on). The OracleTextSearch feature enables Oracle Text as a separate search collection instance on Oracle Database 12c for WebCenter Content, which allows the search collection to reside on a separate computer and not compete with WebCenter Content for processors and memory. This can improve indexing and search response time. 
- 
    The OracleTextSearch collection instance can be installed on a different platform than the WebCenter Content installation. 
- 
    If the OracleTextSearch feature is configured and running, and metadata fields are pushed in to the Content Server instance either by the administrator or by a component (requiring that the Content Server instance be restarted), then the OracleTextSearch index must be rebuilt before content using the new metadata fields can be checked in to the Content Server instance. 
Oracle Text Features and Benefits
This section covers the following topics:
Indexing and Query Speeds and Techniques
Using Oracle Text, WebCenter Content offers a significant increase in index speeds. Oracle Text indexing is transactional. Content Server sends a batch of document to Oracle Text, commits the batch, then starts the Oracle Text indexer. Content Server is notified of which documents failed to index and only those documents are resubmitted to be indexed. Additional capabilities include an automatic Fast Optimization for every 5,000 documents added to the Content Server instance, and a Full Optimization for every 50,000 documents or 20% growth of the repository. Note that Content Server metadata-only search queries may degrade in performance when using Oracle Text.
WebCenter Content uses some of the newest Oracle Text features. For example, Content Server automatically creates a new search index zone for each text information field in order to provide better search speed. Using information zones enables Content Server to query data as if it were full-text data. All text-based information fields (text, long text, and memo) are automatically added to as separate zones. In addition to the zones created for text information fields, Content Server provides an extra zone named IdcContent, which enables custom components, Oracle WebCenter Content: Inbound Refinery components, applications, or users to create XML content with tags that will be indexed as full-text metadata fields.
WebCenter Content uses the SDATA section feature in Oracle Text to index important text, date, and integer fields and define them as Optimized Fields. The SDATA section is a separate XML structure managed by the Oracle Text engine that allows the engine to respond rapidly to requests involving data and integer ranges. Content Server can have up to 32 Optimized Fields, which includes data, integer, standard Content Server fields like dInDate, dOutDate, and fields selected to be optimized. All Optimized Fields are SDATA fields, which by default include dDocName, dDocTitle, dDocType, and dSecurityGroup.
Note:
If you want to change the set of Optimized Fields defined in Oracle Text, the maximum allowed number of Optimized Fields is 32.
To avoid errors when indexing, do not add non-existent metadata fields to the Configuration Manager DrillDownFields parameter, and do not add memo fields to an SDATA section or to the DrillDownFields parameter. See Understanding Management Tools in Managing Oracle WebCenter Content.
Multithreaded Indexing
The MultiThreadIndexer component is designed to introduce some amount of parallelism to the existing legacy indexer (which is a single-threaded process and is specifically designed to run only on one of the servers configured in a cluster), thus increasing the indexing throughput.
The design of MultiThreadIndexer component is currently restricted only to be used for ORACLETEXTSEARCH (OTS) search engine, and allows text extraction and OTS index building to be performed parallelly. This feature is seamless and transparent to the end-user and all the functionalities including UI remain the same.
To use the new MultiThreadIndexer feature:
- 
    Enable the MultiThreadIndexer component. 
- 
    Ensure below configurations are set in the config.cfg file: SearchIndexerEngineName=ORACLETEXTSEARCH EnableMultiThreadIndexer=true
- 
    Restart the WebCenter Content servers. 
Note:
- You can set the ForceMetadataOnlyIndexing configuration variable to true to force the indexer to perform metadata-only indexing irrespective of the search engine configured. Thus, this improves the indexing performance for scenarios where you want to use a full-text based search engine like OracleTextSearch or OpenSearch but do not require full-text indexing capabilities.
- You can set the ValidateMaxIndexableFileSizeEarly configuration variable to true to check and validate whether the uploaded document’s file size is higher than the MaxIndexableFileSize limit configured while indexing. The default value is 10 MB. A document with its file size higher than the MaxIndexableFileSize limit will not be downloaded to the filesystem as it would not go through the text extraction process and therefore improves indexing performance.
Fast Rebuild
OracleTextSearch provides an Indexer Rebuild window when you use the Collection Rebuild Cycle window on the Repository Manager application Indexer tab. The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild. A Fast Rebuild is required in the following cases:
- 
    Adding or removing information fields 
- 
    Changing any Optimized Field 
- 
    Changing an information field to be an Optimized Field 
A Fast Rebuild does not cause all the information (metadata and full-text) to be re-indexed. It adds the changes throughout the collection and updates it. Content Server search functionality is not affected during a Fast Rebuild cycle.
For information on performing a fast rebuild, see Performing a Fast Rebuild.
Query Syntax
Queries defined in Universal Query Syntax are supported and generally do not need any modification. This includes queries saved by users, queries defined in custom components, and queries defined in Site Studio pages.
OracleTextSearch Operators
Oracle Text supports the following defaults:
- 
    CONTAINS 
- 
    MATCHES 
- 
    Has Word Prefix 
- 
    Range searches for dates and integers 
Search Thesaurus
Certain queries, such as stem and Related Term, may be more effective if you use an Oracle Text thesaurus. Oracle Text enables you to create case-sensitive or case-insensitive thesauri which define synonym and hierarchical relationships between words and phrases. You can then search and retrieve documents that contains relevant text by expanding queries to include similar or related terms as defined in the thesaurus. For example, you can populate a thesaurus with specific product names, associated models, associated features, and so forth.
- 
    Default thesaurus: If you do not specify a thesaurus by name in a query, by default, the thesaurus operators use a thesaurus named DEFAULT. However, Oracle Text does not provide a DEFAULT thesaurus. As a result, if you want to use a default thesaurus for the thesaurus operators, you must create a thesaurus named DEFAULT. You can create the thesaurus through any of the thesaurus creation methods supported by Oracle Text: - 
        CTX_THES.CREATE_THESAURUS(PL/SQL)
- 
        ctxloadutility
 
- 
        
- 
    Supplied thesaurus: Oracle Text does not provide a default thesaurus, but Oracle Text does supply a thesaurus, in the form of a file that you load with ctxload, that can be used to create a general-purpose, English-language thesaurus.The thesaurus load file can be used to create a default thesaurus for Oracle Text, or it can be used as the basis for creating thesauri tailored to a specific subject or range of subjects. 
Note:
See the Oracle Text Reference to learn more about using ctxload and the CTX_THES package, and see the chapter, “Working With a Thesaurus in Oracle Text,” in the Oracle Text Application Developer’s Guide.
Case Sensitivity and Stemming Rules
Content Server automatically ensures that queries are executed as case-insensitive. By default, all full-text and text field search queries are case-insensitive. Content Server also handles case-insensitive search queries for information stored as Optimized Fields.
Stemming is an Oracle Text feature that uses the stem ($) operator to search for terms that have the same linguistic root as the query term (the syntax is $term). For example, the input $sing would expand a search to include sang sung sing. Stemming rules can be used to have searches account for plurals, verbs, and so forth. Content Server does not apply any stemming rules by default for Oracle Text, but a set of stemming rules can be created by using the stem ($) operator. Other methods for implementing stemming rules include modifying the standard query definition in the searchindexerrules configuration file (which requires a custom component), and by making configuration changes in the Oracle Text engine (Oracle Database).
Note:
For more information, see the chapter “Oracle Text CONTAINS Query Operators” in the Oracle Text Reference.
Content Server handles content in non-English languages by using the WORLD_LEXER feature in the Oracle Text engine. This enables Oracle Text to automatically identify the language and apply the proper tokenization rules.
Search Results Data Clustering
With the OracleTextSearch feature, Content Server retrieves additional information about a search result list and displays it in a new menu bar on the Search Results page. This information summarizes how many documents are attached to specific values in specific information fields. Content Server supports data clustering for up to four information fields (the default fields are Security Group and Document Type).
This can be useful if you have a query that returns many items. For example, a result set could include 200 content items, including 100 documents that belong to the Public security group, 75 that belong to the Sales group, and 25 that belong to the Marketing group. The menu option for Security Group will show you the list of values and how many documents belong to each value. You can select one of the values (Public, Sales, Marketing) from the menu and it will list only those documents in the result set that belong to that value.
Snippets
Content Server can retrieve document snippets as part of search results to show the occurrence of search terms in context of their usage. This feature is disabled by default. To enable this feature, although it can affect search query performance, set the following configuration entry in the config.cfg file:
OracleTextDisableSearchSnippet=false
Additional Changes
Additional changes because of the use of Oracle Text include:
- 
    XML content is automatically indexed. 
- 
    There are no visible changes in the Search user interface other than removal of Substring as a search operator option. The default search operators are CONTAINS, MATCHES, and HAS WORD PREFIX. Substring-based queries still work. 
- 
    Queries using the MATCHES operator on a non-optimized field behave like a CONTAINS query. For example, if xDepartmentis not optimized, then the queryxDepartment MATCHES 'Marketing'behaves likexDepartment CONTAINS 'Marketing'and returns hits on content items that have anxDepartmentvalue of'Marketing Services'or'Product Marketing'.
- 
    Relevancy ranking can be changed in Oracle Text through use of an operator called DEFINESCORE. This operator can be added through a component to the WhereClausevalue ofOracleTextSearchin theSearchQueryDefinitiontable (in the Oracle Textsearchindexerrulesconfiguration file). More information about this operator is available in the Oracle Text Reference document.
- 
    Complicated queries that previously could be placed into the full-text search box should now be placed in the advanced options on the Query Builder Form. The Query Builder Form is documented in the Using Oracle WebCenter Content. 
- 
    If you need to specify an escape character, use the configuration variable AdditionalEscapeChars=. The default setting is:AdditionalEscapeChars=_:#,-:#The default sets an underscore (_) and a hyphen (-) as escape characters. 
- 
    The PDF Highlighting feature has been disabled. 
- 
    The Spell Checking feature can be enabled, but it requires a custom component just as it did with Autonomy VDK. 
Configuring OracleTextSearch for Content Server
If you did not specify OracleTextSearch when first installing Content Server, to configure the feature:
- 
    Open the config.cfgfile for the Content Server instance in a text editor. For example:MW_HOME/user_projects/domain/servers/ucm/config/config.cfg
- 
    Set the following property value: SearchIndexerEngineName=OracleTextSearchNote: If you are using ACLs, and UseEntitySecurity=trueis set with OracleTextSearch as the search engine, then the following must also be set in theconfig.cfgfile for the Content Server instance:ZonedSecurityFields=xClbraUserList,xClbraAliasList
- 
    If you are using an external data source instead of the system database, change the value SystemDatabasein the following property setting to the external database provider name:IndexerDatabaseProviderName=SystemDatabaseNote: You can specify a separate Oracle Database as the value of IndexerDatabaseProviderName, instead ofSystemDatabase.If the Content Server database used with OracleTextSearch is not Oracle Database, then an external provider for OracleTextSearch must be configured. Obtain the driver and fmwgenerictoken.jarfromMW_HOME/oracle_common/modules/oracle.jdbc_11.1.1/ojdbc6dms.jar.
- 
    Save the file. 
- 
    Restart the Content Server instance. For instructions, see Restarting Content Server or Inbound Refinery Using Fusion Middleware Control. 
- 
    Rebuild the search index. For more information on rebuilding the index, see Working with the Search Index. For more information on configuring Content Server and OracleTextSearch during installation, see FullText Search Option in WebCenter Content Configuration Page in Installing and Configuring Oracle WebCenter Content. 
If you originally configured Content Server to use an external provider with OracleTextSearch, but later need to switch to use SystemDatabase, you must manually run the contentprocedures.sql script against your system database schema. The script file is located in the WC_CONTENT_ORACLE_HOME/ucm/idc/database/oracle/admin/ directory.
Managing OracleTextSearch
This section covers the following topics:
Determining Fields to Optimize
Consider the following when determining the fields to optimize:
- 
    Do you want an exact match in a query? 
- 
    Do you want that match to work faster in a search? 
- 
    Do you want to sort search results by field? 
By default the OracleTextSearch feature optimizes the Content ID and Document Title metadata fields.
A maximum number of 32 fields can be defined as Optimized Fields with the OracleTextSearch feature. The Content Server instance can have up to 32 Optimized Fields, which includes data, integer, standard Content Server fields like dInDate, dOutDate, and fields selected to be optimized. All Optimized Fields are SDATA fields, which by default include dDocName, dDocTitle, dDocType, and dSecurityGroup.
The display of integer fields is dynamic and depends on the Content Server configuration.
Assigning/Editing Optimized Fields
You can select metadata Non-Optimized Fields and assign them to be Optimized Fields for search purposes, or edit Optimized Fields and make them Non-Optimized.
To assign or edit Optimized fields:
- 
    Choose Administration, then Desktop Client Apps. 
- 
    Select Configuration Manager, then the Information Fields tab, then Advanced Search Design. For more information on Configuration Manager, see Exporting Auxiliary Metadata Sets in Managing Oracle WebCenter Content. 
- 
    To make a metadata field Optimized, click Edit Fields. In the Advanced Options for “metadata_field” window, select Is Optimized. 
- 
    To edit an Optimized Field and make it Non-Optimized, click Edit Fields. In the Advanced Options for “metadata_field” window, deselect Is Optimized. 
- 
    When you have completed moving fields, use Index Fast Rebuild in Repository Manager to update the search collection to use the new and modified fields. 
Note:
The Fast Rebuild does not function if a search collection rebuild is in progress.
Performing a Fast Rebuild
The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild. A Fast Rebuild is required in the following cases:
- 
    Adding or removing information fields 
- 
    Changing any Optimized Field 
- 
    Changing an information field to be an Optimized Field 
To perform a fast rebuild:
- 
    Choose Administration, then Desktop Client Apps. 
- 
    Choose Repository Manager, then select the Indexer tab. 
- 
    In the Collection Rebuild Cycle part of the Repository Manager application Indexer tab, click Start. The Indexer Rebuild window opens with a warning that rebuilding the search index is a time-consuming process. If you do not want to start a rebuild now, click Cancel; otherwise, continue with this procedure. 
- 
    In the Indexer Rebuild window, click OK. A Fast Rebuild of the search collection is performed. 
Note:
A Fast Rebuild is not performed if a rebuild of the search collection is in progress.
Note:
The Fast Rebuild process does not create indexer counter values for Full Text, Meta Only, and Delete. To obtain indexer count statistics, you must perform a full collection rebuild.
Modifying the Fields Displayed on Search Results
The OracleTextSearch feature provides default menu options on the Search Results page (set by the Oracle Database configuration script):
DrillDownFields=dDocType, dSecurityGroup
Administrators can add one more option from the list of Optimized Fields to further customize the search results. Edit the configuration to add the option to the list of DrillDownFields. (This function does not support multi-value option lists.)
A Fast Rebuild must be performed after making any change in the DrillDownfields setting.
Searching with OracleTextSearch
Performing a search with OracleTextSearch is generally the same except there are no visible changes in the Search: Expanded Form other than removal of Substring as a search operator option. The default search operator is CONTAINS. Substring-based queries still work.
See Searching with Oracle Text Search in Using Oracle WebCenter Content.
The following table describes the default search operators.
| Operator | Description | Example | 
|---|---|---|
| CONTAINS | Finds content items with the specified whole word or phrase in the metadata field. This is available only for OracleTextSearch, or for Oracle Database and Microsoft SQL Server database with the optional DBSearchContainsOpSupport component enabled. | When formis entered in the Title field, the search returns items with the wordformin their title, but does not return items with the wordperformanceorreform. | 
| MATCHES | Finds items with the exact specified value in the metadata field. | When address change formis entered in the Title field, the search returns items with the exact title ofaddress change form.A query that uses the MATCHES operator on a non-optimized field behaves the same as a query that uses the CONTAINS operator. For example, if the xDepartmentfield is not optimized, then the queryxDepartment MATCHES 'Marketing'behaves likexDepartment CONTAINS 'Marketing', returning hits on documents that have anxDepartmentvalue of'Marketing Services'or'Product Marketing'. | 
| HAS WORD PREFIX | Finds all content items with the specified word at the beginning of the metadata field. No wildcard character is placed before or after the specified value. | When formis entered in the Title field, the search returns all items with the wordformat the beginning of their title, but does not return an item whose title begins with the wordperformanceorreform. | 
Note: We cannot use wildcards (? and *) to escape special characters when CONTAINS and HAS WORD PREFIX operators are used. For example, if we have a dDocTitle as Webcenter_Content, we cannot search with Webcenter?Content or Webcenter*Content with CONTAINS and HAS WORD PREFIX operators.
Using Metadata Wildcards
The following wildcards can be used in metadata search fields, even when using the Quick Search field.
- 
    An asterisk (*) indicates zero or many characters. For example: - 
        form*matchesformandformula
- 
        *ormmatchesformandreform
- 
        *form*matchesform,formula,reform, andperformance
 
- 
        
- 
    A question mark (?) indicates one character. For example: - 
        form?matchesformsandform1, but notformorformal
- 
        ??formmatchesreformbut notperform
 
- 
        
Note:
If you want to search for an asterisk (*) or a question mark (?) without treating it as wildcard, you need to put quotation marks around your search term; for example: "here*". Wildcard (?) do not work for the Security Group metadata field in OracleTextSearch. Also, metadata field values with underscore (_) do not work with wildcard (?).
Using Internet-Style Search Syntax
Search techniques common to the popular Internet search engines are supported in Content Server. For example, entering new product in the Quick Search field will search for new <AND> product, while entering new, product will search for new <OR> product.
To enable this style of search, set the variable DoMetaInternetSearch=True. To disable this style of search, set the variable DoMetaInternetSearch=False. This is the default. For more information, see DoMetaInternetSearch in Configuration Reference for Oracle WebCenter Content.
The following table lists how Content Server interprets common characters.
| Character | Interpreted As | 
|---|---|
| Space ( ) | AND | 
| Comma (,) | OR | 
| Minus (-) | NOT | 
| Phrases enclosed in double-quotes (“ any phrase”) | Exact match of entered phrase | 
The following table lists examples of how Content Server interprets Internet-style syntax in a full-text search.
| Query | Interpreted As | 
|---|---|
| new product | new <AND> product | 
| (new, product) images | (new <OR> product) <AND> images | 
| new product -images | (new <AND> product) <AND> <NOT> images | 
| "new product", "new images" | "new product" <OR> "new images" | 
The following table lists examples of how Content Server interprets Internet-style syntax when searching title metadata using the substring operator.
| Query | Interpreted As | 
|---|---|
| new product | dDocTitle <substring> 'new' <AND> dDocTitle <substring> 'product' | 
| new, product | dDocTitle <substring> 'new' <OR> dDocTitle <substring> 'product' | 
| new -product | dDocTitle <substring> 'new' <AND> <NOT> 'product' | 
| "new product" | dDocTitle <substring> 'new product' | 
Adjusting the Score on OracleTextSearch Results
When you use OracleTextSearch with Oracle Text as the search engine in WebCenter Content, the results of a search by Score are sorted based on the relevancy in documents. In theory, the more relevant the search term is to a document, the higher ranked Score it should receive. In practice, it’s not entirely clear how the relevancy Score ranks the importance of some documents over others based on the search term. When a word appears a certain number of times within a document, the Score reaches a maximum at 100 and the top results can be difficult to discern from one another.
For example, if you searched for the term “vacation” in a set of documents, out of seven results, six of them might have a Score of “100” which means they are basically ranked the same. Having many documents ranked the same doesn’t make the sort by Score very meaningful.
Besides sorting by relevance, you can also tell Oracle Text to sort by occurrence. Sorting by occurrence can provide a much more predictable result in how documents would be ranked, and for many cases it can provide a more meaningful sorting of results then relevance.
To tell Oracle Text to sort by occurrence you must make a small component change to the SearchOperatorMap resource. By default, the query used for full-text searching looks like the following code:
<td>(ORACLETEXTSEARCH)fullText</td>
<td>DEFINESCORE((%V), <strong>RELEVANCE * .1</strong>)</td>
<td>text</td> 
Override this resource and change it to use OCCURRENCE instead of RELEVANCE. This change forces the resource to use occurrence (also note the change in scale from .1 to .01).
<td>(ORACLETEXTSEARCH)fullText</td>
<td>DEFINESCORE((%V),<strong> OCCURRENCE * .01</strong>)</td>
<td>text</td> 
If you run the same search and sort options as mentioned in the earlier example, the results come out differently and each of the seven documents has a unique Score. This provides a clearer understanding of how the items rank. Generally, if the search term appears three times more in one document then another, it has a better chance of being a document you are interested in examining.
Note:
The occurrence ranking also has a maximum count of 100, so if a search term occurs in the document more than that count, the Score result stays at 100.
For your site, using relevance ranking may be more useful than occurrence ranking, however, this option provides an alternate method that might work better for your results.
Customizing Search Results with OracleTextSearch
When users run a search using the Search: Expanded Form, the Search Results page displays an additional menu bar with options that enable users to selectively view search results. The options represent categories used to filter the search results. The options can be context-sensitive, so if only one content item is returned for an option, then it shows only the one result in the menu itself, as shown in Figure 1. The default set of options include Content Type, Security Group, and Account.
Note:
Two default menu options on the OracleTextSearch menu for Search Results can be replaced by customized menu options: Security Group and Document Type.

If more than one content item is found for an option, an arrow is displayed next to the option name. When you move your cursor over the option name, a menu displays the list of the categories found in the search results for that option and the number of content items for each of the categories. You can click any category name on the menu to change the Search Results page to list only those items that match the category
Figure 2 shows a list of categories under Security Group and the number of items found in each category.

| Element | Description | 
|---|---|
| Filter by Category | Displays the categories used to filter the search results, for example: Content Type, Security Group, Account. | 
| Content Type | (Default) Lists the types and the number of each type of content items in the search results. Clicking one of the content type names changes the Search Results to show only those items that match the content type. | 
| Security Group | (Default) Lists the security groups and number of content items assigned to each group in the search results. Security groups include: Administration, Public, and Secure. Clicking one of the security group names changes the Search Results to show only those items that match the security group. | 
| Account | (Default) Lists the account types and number of items assigned to each account in the search results. Clicking one of the account types changes the Search Results to show only those content items that match the account. | 
About Batch Load File Records
A batch load file is made up of file records, which are sets of name/value pairs that specify the action to perform, or the metadata for individual content items, or both.
Note:
Field names and parameters are case sensitive. They must appear in the batch load file exactly as they appear in the following sections. For example, dDocName is not the same as ddocname, dDocname, or DDOCNAME.
- 
    Each file record ends with an <<EOD>>(end of data) marker.
- 
    A pound sign (#) followed by a space at the beginning of a line indicates a comment. The comment character must be followed by a space. For example: # primaryFile=test.txtworks properly, but#primaryFile=test.txtwill cause errors.
- 
    The following is an example of a file record: # This is a comment Action=insert dDocName=Sample1 dDocType=Document dDocTitle=Batch Load record insert example dDocAuthor=sysadmin dSecurityGroup=Public primaryFile=links.doc dInDate=8/15/2001 <<EOD>>
Configuring Full-Text Database Search Index
To set up and use full-text database searching and indexing for SQL Server and other databases:
- 
    Install WebCenter Content with the Content Server instance and configure it to work with the database. 
- 
    Add the following entry to the DomainHomeName\ucm\cs\config\config.cfgfile and save the file:SearchIndexerEngineName=DATABASE.FULLTEXT
- 
    Restart the Content Server instance. For instructions, see Restarting Content Server or Inbound Refinery Using Fusion Middleware Control. 
- 
    Rebuild the search index using the Repository Manager. See Starting the Repository Manager in Managing Oracle WebCenter Content. 
Note:
If you have difficulty rebuilding the full-text database search index after importing the OCS schema, the message Unable to create Oracle text collection 'IdcText1' might be displayed. If this occurs, the solution is to log in as (Content Server) Database administrator and drop the tables IdcText1 and IdcText2.
See Recovering Oracle WebCenter Content in Administering Oracle Fusion Middleware.
Managing Elasticsearch
Let’s learn about managing Elasticsearch with WebCenter Content. WebCenter Content communicates with Elasticsearch through REST APIs.
WebCenter Content supports a variety of search indexer engines including DATABASE.METADATA,DATABASE.FULLTEXT, and ORACLETEXTSEARCH. Out of these, ORACLETEXTSEARCH provides a rich searching capability including full-text searches with relevancy ranking, complex query structures, and improved performance compared to DATABASE.FULLTEXT. However, in a large enterprise setup where content items run into millions and ingestion is quite high, customers find rebuilding the ORACLETEXTSEARCH index to be time-consuming.
WebCenter Content communicates with Elasticsearch through REST APIs provided by Elasticsearch. WebCenter Content APIs/services exposed to users remain the same. While the APIs and user interfaces remain mostly untouched in Elasticsearch, rebuild time has reduced significantly. Users will also experience an improved and near real-time search response.
This section covers the following topics:
- Elasticsearch Features and Benefits
- Configuring Elasticsearch
- Migrating Existing Search Indexes to Elasticsearch Server
Elasticsearch Features and Benefits
Elasticsearch has features such as fast rebuild, full rebuild, reindex, sorting, facets, search operators, and searching.
This section covers the following topics:
- How the Rebuild Feature Works in Elasticsearch?
- Fast Rebuild
- Full Rebuild
- Elasticserver ReIndex
- Sorting
- Facets
- Search Operators and Searching
- Stemming
- Snippets
- Highlighting
How the Rebuild Feature Works in Elasticsearch?
Elasticsearch provides a new Rebuild option, Elasticsearch Reindex.
OracleTextSearch in WebCenter Content lets you perform Fast Rebuild or Full Rebuild (With extraction). So, now users can choose from Fast Rebuild, Full Rebuild (With extraction), and Elasticsearch Reindex (Full Rebuild from Elasticsearch).
With Elasticsearch, the Indexer Rebuild dialog has two check boxes: Use fast rebuild and Full rebuild with content extraction. You can access this dialog box through Repository Manager by selecting Indexer, then Collection Rebuild Cycle, and then Start.
Fast Rebuild
The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild.
A Fast Rebuild is required when adding or removing searchable fields. You can open the Collection Rebuild Cycle window and select the Use fast rebuild checkbox and click OK to do the fast rebuild.
Full Rebuild
The Full Rebuild option rebuilds the search index.
It extracts content and pushes it to the new index in the OpenSearch server using the metadata. This is a time consuming task, and therefore, use with extreme caution.
You can open the Collection Rebuild Cycle window and select the Full rebuild with content extraction check box and click OK to do the full rebuild.
Elasticserver ReIndex
The Elasticserver ReIndex option uses the Elasticsearch API to reindex an existing collection to a new collection.
For reindexing, it reuses already extracted content and metadata available in the active collection. Since this option doesn’t need to extract content, it’s a faster alternative to Full Rebuild.
You can open the Collection Rebuild Cycle window and do not select any of the options. Click OK to do the Elasticsearch ReIndex.
There is an alternate option to do indexing. With this option, you can perform Elasticsearch ReIndex instead of Full rebuild with extraction. To invoke Elasticsearch ReIndex, select Administration, then Admin Actions, then Collection Rebuild Cycle (section), and then Start. In the current version, Indexer Counters are not implemented for Elasticsearch ReIndex. Also, note that the Cancel and Suspend buttons might not work.
Sorting
Elasticsearch can accept any existing searchable field as SortField, so in the search result searchable fields can be sorted.
You don’t have to rebuild if you make a field sortable or not-sortable. Changing sortability of a field is required only for sorting results on the user interface. Even if you don’t make a field sortable from Configuration Manager, if the field is passed as SortField, Elasticsearch sorts the search results by that field.
Facets
With WebCenter Content Elasticsearch, the default number of drilldown value is 50.
It is configurable via MaxElasticSearchDrillDownValues in configuration or can be passed in the binder. MaxElasticSearchDrillDownValues can be any positive integer.
Search Operators and Searching
The Search user interface now includes more search operators. The default search operators are: Contains, Matches, Has Word Prefix, Starts, Ends, Substring, and Not Matches.
Searching
- All search features supported with OracleTextSearch are supported with Elasticsearch as well.
- Elasticsearch does not have Optimized and Zone fields.
- With Elasticsearch, metadata field names are expected to be case-sensitive during search, but the QueryTextis case-insensitive.
- Queries using the MATCHESoperator matches for the case-insensitive exact match of the query text on all searchable fields.
- Elasticsearch does not throw any error if a non-existing field or metadata is searched for. Instead, it shows zero results.
- With Elasticsearch, WebCenter Content gives valid results without ignoring any special characters.
- In the search performed from WebCenter Content user interface, WebCenter Content trims the trailing spaces and then the trimmed value is used as query text. In WebCenter Content user interface, spaces at the end and/or at the start of the query text lead to different results compared to OracleTextSearch. In case of RIDC, Elasticsearch returns search results considering trailing spaces also.
- Text within HTML tags such as <script>..</script>,<style>..</style>,<! -- -->would not be tokenized and hence not searchable.
- WebCenter Content does not allow searching on non-existent or non-searchable fields. It would throw an error message “<fieldname> is not a searchable field”.
Searching Stop Words
The stop words are commonly used words that are excluded from searches to help index and parse web pages faster. For the stop words, Elasticsearch does not create an index entry.
- 
    This list is derived from the OTS stop words. “Mr”,”Mrs”,”Ms”,”a”,”all”,”almost”,”also”,”although”,”an”,”and”,”any”,”are”,”as”,”at”,”be”,”because”,”been”,”both”,”but”,”by”,”can”,”could”,”d”,”did”,”do”,”does”,”either”,”for”,”from”,”had”,”has”,”have”,”having”,”he”,”her”,”here”,”hers”,”him”,”his”,”how”,”however”,”i”,”if”,”in”,”into”,”is”,”it”,”its”,”just”,”ll”,”me”,”might”,”my”,”no”,”non”,”nor”,”not”,”of”,”on”,”one”,”only”,”onto”,”or”,”our”,”ours”,”s”,”shall”,”she”,”should”,”since”,”so”,”some”,”still”,”such”,”t”,”than”,”that”,”the”,”their”,”them”,”then”,”there”,”therefore”,”these”,”they”,”this”,”those”,”though”,”through”,”thus”,”to”,”too”,”until”,”ve”,”very”,”was”,”we”,”were”,”what”,”when”,”where”,”whether”,”which”,”while”,”who”,”whose”,”why”,”will”,”with”,”would”,”yet”,”you”,”your”,”yours”. 
- 
    When you are searching with a stop word, Elasticsearch treats you as if you are searching with an empty string instead of that word. 
- 
    The stop words are applicable only on search queries that are Full-Text Search,Quick Search,Contains,Has Word Prefix.
- 
    A query ( Full-Text Search,Quick Search, andContains) composed of a stop word or a phrase composed of only stop words would return all results as if it is an empty search. For example, a query on the word this returns all hits as this is defined as a stop word.
- 
    A query ( Has Word Prefix) composed of a stop word or a phrase composed of only stop words would return no results. For example, a query on the word this returns all hits as this is defined as a stop word.
- 
    You can query on phrases that contain stop words as well as non-stop words. In such cases, the phrase is searched as if the stop word in the phrase does not exist. For example, a query on phrase this title returns hit as if you are only searching the word title as this is a stop word. 
Stemming
Stemming is applicable only on text queries: Contains, Has Word Prefix, Full Text Search, and QuickSearch.
Stemming words differ from OracleTextSearch to Elasticsearch because internally the search engines use different dictionaries. For example, in OracleTextSearch, a search query for the word “find” returns found, finds, finding and for the word “make”, the query returns make, made, makes, making. In Elasticsearch, the search result for “find” shows find, finds, finding and for “make” the result shows make, makes, making. “Found” and “made” are not shown in Elasticsearch results, but they do in OracleTextSearch.
Snippets
You can enable the Snippets feature with Elasticsearch by setting the following configuration entry in the config.cfg file: ElasticSearchDisableSearchSnippet=false.
Keep in mind that this feature can affect search query performance. Snippets displayed with Elasticsearch are different from those that are displayed with OracleTextSearch. Look-and-feel of snippets in Elasticsearch is different from the look-and-feel of OracleTextSearch snippets. With Elasticsearch one complete sentence is equal to one snippet.
In an Elasticsearch result, if the document is resulted in search because of only metadata match but not from the extracted content of the document, only that metadata value is shown as snippet.
Highlighting
Elasticsearch highlights the search keywords but does not give pointers to the previous and next match.
OracleTextSearch highlights the search keywords along with pointers to next and previous match.
Elasticsearch highlights returns the extracted content of a document only when there is a match in the extracted content. Highlighting shows metadata of the document only if there is any match with that particular metadata value.
If the match is limited to metadata of the document, only the matched metadata fields are listed but not the extracted content.
Configuring Elasticsearch
In this section, you’ll learn how you can configure Elasticsearch for WebCenter Content. Before configuring Elasticsearch for WebCenter Content, you’ll need to secure nodes of the cluster, secure Elasticsearch, and start the Base node first and then other nodes.
To configure Elasticsearch for WebCenter Content, follow these steps:
Note: Indices in Elasticsearch are stored as files on disk. For Elasticsearch to work, it requires large amount of free disk space. For more information, contact Oracle support.
- 
    Download and unzip 7.6 or newer 7.x versions of Elasticsearch from https://www.elastic.co/downloads/past-releases#elasticsearch. 
- 
    Navigate to <IdcHomeDir>/components/ElasticSearch/scripts.Note: WebCenter Content provides a script SecureES.shorSecureES.cmdthat automates the steps to secure the Elasticsearch nodes (one or more) of an Elasticsearch cluster. It is assumed that Elasticsearch cluster is installed on all the nodes of the cluster. It can be a single node cluster also. If it is multi-node cluster, it should have at least 3 master-eligible nodes.
- 
    Run script on all the nodes of the cluster. Before running a node, it should be secured first. Base node should be started first and then other nodes. 
To download an Elasticsearch client JAR, follow these steps:
- Go to https://repo1.maven.org/maven2/org/elasticsearch/client/elasticsearch-rest-client/ and browse for the relevant version.
- Download the required version of the JAR file in <IdcHomeDir>/components/ElasticSearch/lib/.
This section covers the following topics:
- Updating ESnode.properties
- Using SecureES.sh on Unix
- Using SecureES.cmd on Windows
- Securing Elasticsearch
- Securing Other Nodes of Cluster
- Start Elasticsearch Cluster
- Configuring Elasticsearch for WebCenter Content
- Monitoring Elasticsearch Cluster Health
- Configuring Index Settings
Updating ESnode.properties
The ESnode.properties file needs to be updated before setting up all the Elasticsearch nodes that would be secured as part of the initial cluster setup.
Update configuration for all the nodes that are going to be part of the setup before securing them. The ESnode.properties file should be present in the same folder where script file is residing. Follow these steps:
- 
    Configure individual nodes: Configure all the nodes that are planned for the initial cluster setup. Provide the entries (node1, node2, node3, ……, node{n}) as the number of nodes being created as part of the setup. node{n}_ES_HOME node{n}_node_name node{n}_http_portWhere {n} is the nth node in the setup. For example: ##Node1 (BASE NODE) node1_ES_HOME=/ESuser/elasticsearch-7.6.0_1 node1_node_name=nodeA node1_http_port=9201##Node2 node2_ES_HOME=/ESuser/elasticsearch-7.6.0_2 node2_node_name=nodeB node2_http_port=9202
- 
    Common configuration for all nodes: - BASE_ES_HOME: This should be same as- node1_ES_HOMEor where- config/{certificate_name}and- config/elasticsearch.keystoreare accessible to all nodes. For example,- BASE_ES_HOME=/ESuser/elasticsearch-7.6.0_1.
- cluster_name: Name of the cluster. For example,- cluster_name=wcc-elasticsearch.
- certificate_name: Certificate name (extension must be- .p12) for which cluster will be secured. For example,- certificate_name=elastic-certificates.p12.
- wcc_es_admin_user: User with which WebCenter Content will communicate with Elasticsearch. For example,- wcc_es_admin_user=wccesadmin.
- cluster_initial_master_nodes: All node names that are part of the initial cluster setup. For example,- cluster_initial_master_nodes=["nodeA","nodeB","nodeC",…,”node{N}”].
- discovery_seed_hosts: All hostnames where these nodes are going to be configured. This is mandatory only if Elasticsearch cluster is horizontal. For example,- discovery_seed_hosts=["host1.example.com","host2.example.com","host3.example.com",…,” host{n}.example.com”]
- WINDOWS_CURL_HOME: It is required for windows and only for base node (node1). For example,- C:\curl-7.72.0_5-win64-mingw\bin\curl.exewhere- WINDOWS_CURL_HOME = C:\curl-7.72.0_5-win64-mingw.
 
Using SecureES.sh on Unix
The script automates the steps to secure Elasticsearch cluster nodes on Unix.
Usage:
For help:
./SecureES.sh -h or --help
To run script:
./SecureES.sh -n <nodenumber> 
For example, if you have 3 nodes to secure, it is mandatory to run the script on the first node and then other nodes.
./SecureES.sh -n 1 
./SecureES.sh -n 2 
./SecureES.sh -n 3 
Using SecureES.cmd on Windows
The script automates the steps to secure Elasticsearch cluster nodes on Windows.
Usage:
To run the script:
SecureES.cmd -n <nodenumber> 
For example, if you have 3 nodes to secure, it is mandatory to run the script on the first node and then other nodes.
SecureES.cmd -n 1   
SecureES.cmd -n 2
SecureES.cmd -n 3
Securing Elasticsearch
Follow these steps to secure First (Base) Node:
- 
    Navigate to <ELASTIC_COMPONENT_DIR>/scripts.
- 
    Run the script. For windows, run SecureES.cmd -n <nodenumber>and for Unix, run./SecureES.sh -n <nodenumber>.
- 
    You will be asked to enter the name of the certificate. If you don’t enter, it will take the default name elastic-certificates.p12. Certificate should have the extensionp12. Give a password for the certificate.  
- 
    Add the certificate password to the keystore. If a elasticsearch keystore is not present, it will ask you to create one. Press yto create the keystore and proceed. Note that choosingNhere will not secure the node. You will be asked to enter the password 4 times. Enter the above used certificate password.  
- 
    Set up the password for the reserved user, elastic. Enter a password for the userelastic. This will be used in later step to create a user to communicate with WebCenter Content. 
- 
    Create a user to communicate with the WebCenter Content. You will be asked to enter a user name and password. Enter the name or press ENTER to use the default name wccesadmin. Enter the password set to the user elastic. 
- 
    Once the setup is done, you will see the setup complete message. 
- 
    Do not start the node now. 
Securing Other Nodes of Cluster
You need to run the script to secure the nodes of a cluster.
Follow these steps to secure other nodes of cluster:
- 
    Navigate to <ELASTIC_COMPONENT_DIR>/scripts.
- 
    Run the script. The cluster name should be same for all the nodes. The node names should be unique. For Unix: ./SecureES.sh -n 2For Windows: SecureES.cmd -n 2
- 
    Once the setup is done, you will see the setup complete message. 
- 
    Do not start the node now. 
Start Elasticsearch Cluster
After securing or configuring all the nodes of the cluster, you can start all the nodes.
After securing all the nodes, go to <ES_HOME>/bin of each node and run
./elasticsearch
Start the base node (node1) first and then start other nodes.
You should start the BASE NODE (node1) first and then start other nodes.
After nodes are started, you can access each node with wccesadmin.
https://<hostname>:<nodeport>
Configuring Elasticsearch for WebCenter Content
Before you configure Elasticsearch for WebCenter Content, you need to do the mandatory initial configuration settings along with enabling the Elasticsearch search indexer.
To configure Elasticsearch for WebCenter Content, follow these steps:
- 
    Start the WebCenter Content managed server. 
- 
    Select Adminstration, then Elasticsearch, and then Elasticsearch Configuration. 
- 
    In the Elasticsearch Configuration page, enter the values for the following fields as shown in the figure below: - Elasticsearch Nodes to connect - comma-separated list of Elasticsearch nodes of a cluster
- Username - user name to connect to Elasticsearch
- Password - user password
- Certificate Path - absolute path of the certificate using the cluster which is secured
- Password - cerificate password
 
 
Monitoring Elasticsearch Cluster Health
For WebCenter Content to function properly, it is important to have a good Elasticsearch cluster health.
This feature is introduced to monitor Elasticsearch health at an interval of 1 hour. If the status of the Elasticsearch health issue is Red or connection is down, then an alert will be added and monitored every minute until Elasticsearch health status turns Green or Yellow. Once the status of the Elasticsearch health turns Green or Yellow, health alert will be removed automatically and continue to monitor every hour thereafter.
The figure below is showing Elasticsearch connection is down temporarily.

Configuring Index Settings
You can configure shards and replicas for different indexes as per the required data.
This new feature allows to customize shards and replica counts for each Elasticsearch index. As per Elasticsearch design, each index in Elasticsearch would be mapped to a security group in WebCenter Content. The indexes will be created during:
- server startup
- new Security Group is added to the system
- collection rebuild or reindex
- migration from other search engines to Elasticsearch
Shards and replicas will be allotted to the indexes when they are created in the system based on the user configuration. Any updates to these settings will be reflected only after next Full Rebuild or Reindex cycle. You can set limit on the shards and replicas counts.
Shards count: It should be an integer value ranging from 5 to 300. The default value is 5.
Replicas count: It should be either 1 or 2. The default value is 1.
If connection with Elasticsearch is not established and no indexes are created yet, an additional optional alert will appear along with the existing Elasticsearch alerts.

On clicking this alert message, you will be redirected to ElasticSearch Index Settings page where you can customize shards and replicas for each security group (index) existing in the system.
Indexes with these customized settings will be created when successful connection with Elasticsearch is established. In case of migration to Elasticsearch from other search engines, migration needs to be successful for these indexes to get created with the customized settings.
For already configured Elasticsearch instances, the indexes are created with the default index settings.
To configure index settings:
- 
    Select Administration, then ElasticSearch, and then ElasticSearch Index Settings. 
- 
    In the Configure Index Settings page, you (admin) can configure indexes with desired shards and replicas count. The updated shard and replica settings will be reflected after: - next Full Rebuild or Reindex cycle
- establishing successful connection in a fresh instance
- migration if you are switching over from a different search engine You can not update specific indexes. Once the Update button is clicked, all the records will be updated.
  
- 
    To view all the active indexes and their shard and replica settings retrieved from the Elasticsearch server, select the Active Index Settings tab.  
Adding New Security Group
If a new security group is added after successful connection to the Elasticsearch server from WebCenter Content, its corresponding index will be created in Elasticsearch with default shard (5) and replica (1) counts.
If you want to customize its settings, you can do it from the ElasticSearch Index Settings page, but they will be reflected only after next rebuild or reindex cycle.
Migrating Existing Search Indexes to Elasticsearch Server
When you migrate from the active search index to the Elastic server, the active index is changed to es1.
Note: During the migration of 5 million records from OTS to Elasticsearch, for every text field, you need to create 4 types of mappings for various search operations in Elasticsearch. Elasticsearch considers these mappings as different fields. For example, A text field dDocTitle will have dDocTitle, dDocTitle.normalize, dDocTitle.keyword, dDocTitle.stem, and they are considered as 4 fields, not one field. So, if you have 250 text fields, Elasticsearch will consider them as 250*4 = 1000 fields. For metadata other than text fields, there is only one mapping. After deleting unwanted metadata fields, you will be able to perform the migration activity.
If an existing WebCenter Content instance is configured to use the ORACLETEXTSEARCH (OTS) search engine, then the active index ots1/ots2 will be used to fetch the already extracted content. A successful migration activity will change the active search index to the Elastic server, es1.
Select Administration and then Configuration for <hostname with port> page is displayed. It will display ots1/ots2 as an active index as shown below:

To migrate, select Administration and then ElasticSearch. The ElasticSearch Migration page is displayed. Select the appropriate search engine from the Search Engine to Migrate drop-down menu as shown below:

Migration Batch Size determines the number of documents batched together to push to the Elasticsearch server. We need to carefully choose the batch size, as in case of the full-text search engines like ORACLETEXTSEARCH, the batch will also include the text-extracted content of the documents along with its metadata.
Migrate Metadata Only indicates whether we need to push the text-extracted content to the Elasticsearch server. In case of the full-text search engines like ORACLETEXTSEARCH, this should be always set to False. It means the text-extracted content is also pushed to the Elasticsearch server.
Upon starting a migration activity, a table of all recent migration jobs and its status details will be listed as shown below:

You can pause or resume an on-going migration activity and can retry the latest failed migration activity, if any. A completed migration activity details are shown below:

A successful migration activity will switch active index to es1 as shown below:

Note: A successful migration activity will remove the migration alert banner.
Managing OpenSearch
Let’s learn about managing OpenSearch with WebCenter Content.
Oracle Cloud Infrastructure (OCI) Search Service with OpenSearch is an insight engine offered as an Oracle-managed service. Without any downtime, Oracle automates patching, updating, upgrading, backing up, and resizing the service. You can store, search, and analyze large volumes of data quickly and see results in near real-time.
WebCenter Content communicates with OpenSearch through REST APIs. WebCenter Content APIs or services exposed to the users remain the same.
This section covers the following topics:
- OpenSearch Features and Benefits
- Configuring OpenSearch
- Migrating Existing Search Indexes to OpenSearch
OpenSearch Features and Benefits
OpenSearch has features such as fast rebuild, full rebuild, reindex, sorting, facets, search operators, and searching.
This section covers the following topics:
- How the Rebuild Feature Works in OpenSearch?
- Fast Rebuild
- Full Rebuild
- OpenSearch ReIndex
- Sorting
- Facets
- Search Operators and Searching
- Stemming
- Snippets
- Highlighting
How the Rebuild Feature Works in OpenSearch?
OpenSearch provides a new Rebuild option, OpenSearch Reindex.
OpenSearch in WebCenter Content lets you perform Fast Rebuild or Full Rebuild (With extraction). So, now users can choose from Fast Rebuild, Full Rebuild (With extraction), and OpenSearch Reindex (Full Rebuild from Elasticsearch).
With OpenSearch, the Indexer Rebuild dialog has two check boxes: Use fast rebuild and Full rebuild with content extraction. You can access this dialog box through Repository Manager by selecting Indexer, then Collection Rebuild Cycle, and then Start.
Fast Rebuild
The Fast Rebuild feature allows the search engine to add new information to the search collection without requiring a full collection rebuild.
A Fast Rebuild is required when adding or removing searchable fields. You can open the Collection Rebuild Cycle window and select the Use fast rebuild checkbox and click OK to do the fast rebuild.
Full Rebuild
The Full Rebuild option rebuilds the search index.
It extracts content and pushes it to the new index in the OpenSearch server using the metadata. This is a time consuming task, and therefore, use with extreme caution.
You can open the Collection Rebuild Cycle window and select the Full rebuild with content extraction check box and click OK to do the full rebuild.
OpenSearch ReIndex
The OpenSearch ReIndex option uses the OpenSearch API to reindex an existing collection to a new collection.
For reindexing, it reuses already extracted content and metadata available in the active collection. Since this option doesn’t need to extract content, it’s a faster alternative to Full Rebuild.
You can open the Collection Rebuild Cycle window and do not select any of the options. Click OK to do the OpenSearch ReIndex.
There is an alternate option to do indexing. With this option, you can perform OpenSearch ReIndex instead of Full rebuild with extraction. To invoke OpenSearch ReIndex, select Administration, then Admin Actions, then Collection Rebuild Cycle (section), and then Start. In the current version, Indexer Counters are not implemented for OpenSearch ReIndex. Also, note that the Cancel and Suspend buttons might not work.
Sorting
OpenSearch can accept any existing searchable field as SortField, so in the search result searchable fields can be sorted.
You don’t have to rebuild if you make a field sortable or not-sortable. Changing sortability of a field is required only for sorting results on the user interface. Even if you don’t make a field sortable from Configuration Manager, if the field is passed as SortField, OpenSearch sorts the search results by that field.
Facets
With WebCenter Content OpenSearch, the default number of drilldown value is 50.
It is configurable via MaxOpenSearchDrillDownValues in configuration or can be passed in the binder. MaxOpenSearchDrillDownValues can be any positive integer.
Search Operators and Searching
The Search user interface now includes more search operators. The default search operators are: Contains, Matches, Has Word Prefix, Starts, Ends, Substring, and Not Matches.
Searching
- All search features supported with OracleTextSearch are supported with OpenSearch as well.
- OpenSearch does not have Optimized and Zone fields.
- With OpenSearch, metadata field names are expected to be case-sensitive during search, but the QueryTextis case-insensitive.
- Queries using the MATCHESoperator matches for the case-insensitive exact match of the query text on all searchable fields.
- OpenSearch does not throw any error if a non-existing field or metadata is searched for. Instead, it shows zero results.
- With OpenSearch, WebCenter Content gives valid results without ignoring any special characters.
- In the search performed from WebCenter Content user interface, WebCenter Content trims the trailing spaces and then the trimmed value is used as query text. In WebCenter Content user interface, spaces at the end and/or at the start of the query text lead to different results compared to OracleTextSearch. In case of RIDC, OpenSearch returns search results considering trailing spaces also.
- Text within HTML tags such as <script>..</script>,<style>..</style>,<! -- -->would not be tokenized and hence not searchable.
- OpenSearch does not allow searching on non-existent or non-searchable fields. It would throw an error message “<fieldname> is not a searchable field”.
Searching Stop Words
The stop words are commonly used words that are excluded from searches to help index and parse web pages faster. For the stop words, OpenSearch does not create an index entry.
- 
    This list is derived from the OTS stop words. “Mr”,”Mrs”,”Ms”,”a”,”all”,”almost”,”also”,”although”,”an”,”and”,”any”,”are”,”as”,”at”,”be”,”because”,”been”,”both”,”but”,”by”,”can”,”could”,”d”,”did”,”do”,”does”,”either”,”for”,”from”,”had”,”has”,”have”,”having”,”he”,”her”,”here”,”hers”,”him”,”his”,”how”,”however”,”i”,”if”,”in”,”into”,”is”,”it”,”its”,”just”,”ll”,”me”,”might”,”my”,”no”,”non”,”nor”,”not”,”of”,”on”,”one”,”only”,”onto”,”or”,”our”,”ours”,”s”,”shall”,”she”,”should”,”since”,”so”,”some”,”still”,”such”,”t”,”than”,”that”,”the”,”their”,”them”,”then”,”there”,”therefore”,”these”,”they”,”this”,”those”,”though”,”through”,”thus”,”to”,”too”,”until”,”ve”,”very”,”was”,”we”,”were”,”what”,”when”,”where”,”whether”,”which”,”while”,”who”,”whose”,”why”,”will”,”with”,”would”,”yet”,”you”,”your”,”yours”. 
- 
    When you are searching with a stop word, OpenSearch treats you as if you are searching with an empty string instead of that word. 
- 
    The stop words are applicable only on search queries that are Full-Text Search,Quick Search,Contains,Has Word Prefix.
- 
    A query ( Full-Text Search,Quick Search, andContains) composed of a stop word or a phrase composed of only stop words would return all results as if it is an empty search. For example, a query on the word this returns all hits as this is defined as a stop word.
- 
    A query ( Has Word Prefix) composed of a stop word or a phrase composed of only stop words would return no results. For example, a query on the word this returns all hits as this is defined as a stop word.
- 
    You can query on phrases that contain stop words as well as non-stop words. In such cases, the phrase is searched as if the stop word in the phrase does not exist. For example, a query on phrase this title returns hit as if you are only searching the word title as this is a stop word. 
Stemming
Stemming is applicable only on text queries: Contains, Has Word Prefix, Full Text Search, and QuickSearch.
Stemming words differ from OracleTextSearch to OpenSearch because internally the search engines use different dictionaries. For example, in OracleTextSearch, a search query for the word “find” returns found, finds, finding and for the word “make”, the query returns make, made, makes, making. In OpenSearch, the search result for “find” shows find, finds, finding and for “make” the result shows make, makes, making. “Found” and “made” are not shown in OpenSearch results, but they do in OracleTextSearch.
Snippets
You can enable the Snippets feature with OpenSearch by setting the following configuration entry in the config.cfg file: OpenSearchDisableSearchSnippet=false.
Keep in mind that this feature can affect search query performance. Snippets displayed with OpenSearch are different from those that are displayed with OracleTextSearch. Look-and-feel of snippets in OpenSearch is different from the look-and-feel of OracleTextSearch snippets. With OpenSearch one complete sentence is equal to one snippet.
In an OpenSearch result, if the document is resulted in search because of only metadata match but not from the extracted content of the document, only that metadata value is shown as snippet.
Highlighting
OpenSearch highlights the search keywords but does not give pointers to the previous and next match.
OracleTextSearch highlights the search keywords along with pointers to next and previous match.
OpenSearch highlights returns the extracted content of a document only when there is a match in the extracted content. Highlighting shows metadata of the document only if there is any match with that particular metadata value.
If the match is limited to metadata of the document, only the matched metadata fields are listed but not the extracted content.
Configuring OpenSearch
In this section, you’ll learn how to configure OpenSearch for WebCenter Content, monitor cluster health, and configure index settings.
The WebCenter Content connects to an existing OCI OpenSearch cluster.
This section covers the following topics:
- Configuring OpenSearch for WebCenter Content with OCI
- Configuring OpenSearch for WebCenter Content
- Monitoring OpenSearch Cluster Health
- Configuring Index Settings
Configuring OpenSearch for WebCenter Content with OCI
To configure OpenSearch for WebCenter Content with OCI, follow these steps:
- 
    For WebCenter Content instance, open a shell logged in as the user that owns WebCenter Content domain files and directories (typically user oracle).
- 
    Change the directory to <WCC domain path>.
- 
    To get the OpenSearch certificate, in a shell of WebCenter Content instance, run the following command: openssl s_client -showcerts -connect <OpenSearch private IP>:9200 </dev/null | sed -n -e '/-.BEGIN/,/-.END/ p' > cert.pem
- 
    To test the connection from WebCenter Content instance to the OpenSearch cluster: /usr/bin/curl -u <username>:<password> https:<OpenSearch private IP>:9200 –insecureThis is merely a simple test to see if WebCenter Content instance can reach the OS cluster. If successful, it will return the following: [oracle@wcctestinstance ~]$ /usr/bin/curl -u <username>:<password> https://<OpenSearch private IP>:9200 { "name" : "opensearch-master-0", "cluster_name" : "amaaaaaal6hvfiqauqzbmvklzsowhydlrvpdfa544kitmgdymnugepq5nkwq", "cluster_uuid" : "EtrnIgjXQmmuK4gBdf02xg", "version" : { "distribution" : "opensearch", "number" : "2.11.0", "build_type" : "tar", "build_hash" : "unknown", "build_date" : "2024-05-28T05:20:26.940869407Z", "build_snapshot" : false, "lucene_version" : "9.7.0", "minimum_wire_compatibility_version" : "7.10.0", "minimum_index_compatibility_version" : "7.0.0" }, "tagline" : "The OpenSearch Project: https://opensearch.org/" }
- 
    In the shell, change the directory to <WCC domain path>/ucm/cs/config. If this is a clustered WebCenter Content, theconfig.cfgfile will be located under the file share used by the WebCenter Content.
- 
    Edit the config.cfgfile. Add the following entry:SearchIndexerEngineName=OPENSEARCHIf SearchIndexerEngineNameis set toOracleTextSearchorDATABASE.METADATA, either delete or comment out those lines.
- 
    Save and exit the file. 
- 
    Restart the WebCenter Content managed server(s). 
- 
    Open the WebCenter Content page. 
- 
    Select Administration, then OpenSearch, and then OpenSearch Configuration. 
- 
    In the OpenSearch Configuration page, enter the values for the fields as explained in Configuring OpenSearch for WebCenter Content. Click the Update button. 
If the WebCenter Content connects to OpenSearch, it will show the following status:
- Green: OpenSearch was configured for three master and data nodes.
- Yellow: OpenSearch was configured for single node cluster. This is due to the single node not being able to distribute its replicate shards. It can be ignored, it won’t affect indexing and searches.
The initial configuration for OpenSearch doesn’t require an initial collection rebuild. Once the parameters in the OpenSearch Configuration page are completed and the WebCenter Content is connected to OpenSearch, a collection rebuild isn’t required.
As part of the configuration, the OpenSearch indices (based on WebCenter Content security groups) will be created. Items can be checked in and searched for. If items were checked in before, they also are searchable.
Note: If a new metadata field is to be created or if fields from another WebCenter Content instance will be migrated using CMU, after the creation or CMU import, immediately run the Fast Rebuild.
Until the Fast Rebuild is run:
- Do not check in new content with the new field value populated.
- Do not archive import content that have the field value populated.
The Fast Rebuild will take a very long time to complete if it has to index field values in fields that didn’t already have that field in the index. For more details, see ElasticSearch Fast Rebuild Takes a Long Time to Complete.
Configuring OpenSearch for WebCenter Content
Before you configure OpenSearch for WebCenter Content, you need to do the mandatory initial configuration settings along with enabling the OpenSearch search indexer.
The initial configuration settings are shown in the figure below:

If the above step is not done, stop the WebCenter Content managed server and set the below parameter in the config.cfg file:
SearchIndexerEngineName=OPENSEARCH
Now, start the WebCenter Content managed server.
To configure OpenSearch for WebCenter Content, follow these steps:
- 
    Start the WebCenter Content managed server. 
- 
    Select Adminstration, then OpenSearch, and then OpenSearch Configuration. 
- 
    In the OpenSearch Configuration page, enter the values for the following fields as shown in the figure below: - OpenSearch Cluster - comma-separated list of OpenSearch nodes of a cluster
- OpenSearch Certificate Type to connect - certificate type to connect to OpenSearch
- Root Certificate Path - absolute path of the root certificate
- Authorization - method to communicate with OpenSearch Note: You need to use Basic Auth as the authorization method if you are using OpenSearch 2.x.
  
Monitoring OpenSearch Cluster Health
For WebCenter Content to function properly, it is important to have a good OpenSearch cluster health.
This feature is introduced to monitor OpenSearch health at an interval of 1 hour. If the status of the OpenSearch health issue is Red or connection is down, then an alert will be added and monitored every minute until OpenSearch health status turns Green or Yellow. Once the status of the OpenSearch health turns Green or Yellow, health alert will be removed automatically and continue to monitor every hour thereafter.
Configuring Index Settings
You can configure shards and replicas for different indexes as per the required data.
This new feature allows to customize shards and replica counts for each OpenSearch index. As per OpenSearch design, each index in OpenSearch would be mapped to a security group in WebCenter Content. The indexes will be created during:
- server startup
- new Security Group is added to the system
- collection rebuild or reindex
- migration from other search engines to OpenSearch
Shards and replicas will be allotted to the indexes when they are created in the system based on the user configuration. Any updates to these settings will be reflected only after next Full Rebuild or Reindex cycle. You can set limit on the shards and replicas counts.
Shards count: It should be an integer value ranging from 5 to 300. The default value is 5.
Replicas count: It should be either 1 or 2. The default value is 1.
If connection with OpenSearch is not established and no indexes are created yet, an additional optional alert will appear along with the existing OpenSearch alerts.
On clicking the alert message, you will be redirected to OpenSearch Index Settings page where you can customize shards and replicas for each security group (index) existing in the system.
Indexes with these customized settings will be created when successful connection with OpenSearch is established. In case of migration to OpenSearch from other search engines, migration needs to be successful for these indexes to get created with the customized settings.
For already configured OpenSearch instances, the indexes are created with the default index settings.
To configure index settings:
- 
    Select Administration, then OpenSearch, and then OpenSearch Index Settings. 
- 
    In the Configure Index Settings page, you (admin) can configure indexes with desired shards and replicas count. The updated shard and replica settings will be reflected after: - next Full Rebuild or Reindex cycle
- establishing successful connection in a fresh instance
- migration if you are switching over from a different search engine You can not update specific indexes. Once the Update button is clicked, all the records will be updated.
  
- 
    To view all the active indexes and their shard and replica settings retrieved from the OpenSearch server, select the Active Index Settings tab.  
Adding New Security Group
If a new security group is added after successful connection to the OpenSearch server from WebCenter Content, its corresponding index will be created in OpenSearch with default shard (5) and replica (1) counts.
If you want to customize its settings, you can do it from the OpenSearch Index Settings page, but they will be reflected only after next rebuild or reindex cycle.
Migrating Existing Search Indexes to OpenSearch
If the WebCenter Content server was previously configured with other search engines (like OTS, FULLTEXT, Elasticsearch) and now the search engine has changed to OpenSearch, content should be migrated.
To migrate, select Administration, then OpenSearch, and then OpenSearch Migration. The figure below is showing the migration from Elastisearch to OpenSearch. While migrating from Elastisearch to OpenSearch, only the METADATA option is available in the Search Engine to Migrate drop-down menu.

Migration Batch Size determines the number of documents included as a batch together to be pushed to the OpenSearch server. We need to carefully choose the batch size, the batch will also include the text-extracted content of the documents along with its metadata.
Migrate Metadata Only indicates whether we need to push the text-extracted content to the OpenSearch server. In case of the full-text search engines like OpenSearch, this should be always set to False. It means the text-extracted content is also pushed to the OpenSearch server.
Upon starting a migration activity, a table of all recent migration jobs and its status details will be listed.