10 Configuring Elasticsearch in WebCenter Portal
Configure Elasticsearch to index and search objects in WebCenter Portal.
If you have upgraded from a prior release, WebCenter Portal may be configured to use Oracle SES, described in Configuring Search with Oracle SES in WebCenter Portal. Oracle recommends that you use Elasticsearch in WebCenter Portal, as described in this chapter.
Permissions:
To perform the tasks in this chapter, you must be granted the WebLogic Server Admin
role through the Oracle WebLogic Server Administration Console and the Administrator
role granted through WebCenter Portal Administration.
For more information about roles and permissions, see Understanding Administrative Operations, Roles, and Tools.
Understanding Search with Elasticsearch
Elasticsearch is a highly scalable search engine. It allows you to store, search, and analyze big volumes of data quickly and provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.
Advantages of Elasticsearch
-
Elasticsearch provides full-text search capabilities as it is built on Lucene.
-
Elasticsearch is document-oriented. It stores data as structured JSON documents and indexes all fields by default, with a higher performance result.
-
Elasticsearch is API driven; actions can be performed using a simple Restful API.
-
Elasticsearch retrieves search results fast because it searches an index instead of searching the text directly.
You can configure Elasticsearch to search the following resources in WebCenter Portal:
-
Documents, including wikis and blogs
-
Portals, page metadata, lists, and people resources
-
Announcements and Discussions (available only for portals upgraded from prior releases)
Configuration Roadmap for Elasticsearch in WebCenter Portal
Table 10-1 Roadmap - Setting Up Elasticsearch in WebCenter Portal
Actor | Task |
---|---|
Administrator |
|
Administrator |
|
Administrator |
|
Administrator |
|
Administrator |
Customizing Search Settings in WebCenter Portal Administration |
Administrator |
(Optional) Configuring Search Custom Attributes for Elasticsearch |
Administrator |
(Optional) Modifying Elasticsearch Global Attributes |
Prerequisites for Configuring Elasticsearch
Ensure the following requirements:
-
Oracle WebCenter Portal is installed.
-
Optional. If you choose to use WebCenter Content for search, ensure that WebCenter Content is configured and all required components are enabled. See Managing Connections to Oracle WebCenter Content Server.
Creating a Crawl Admin User in WebCenter Portal
You can designate an existing user as crawl admin or create a crawl admin user (for example, mycrawladmin
) in WebCenter Portal and in your back-end identity management server to search using Elasticsearch. You must create a crawl admin user only once.
Note:
See your identity management system documentation for information on creating users.
The following example uses Oracle Directory Services Manager to create the mycrawladmin
user:
Modifying the Default Connection Settings for Document Content Crawl Plugin in Elasticsearch Server
After installing Elasticsearch, you can modify the default connection settings for document content crawl plugin using the configuration file.
You can specify the following attributes in the configuration file:
-
es.wcc.connection.timeout is the connection time-out interval, in seconds. This is the amount of time Elasticsearch server will wait to establish the connection to the WebCenter Content server. The default value is 30 seconds.
-
es.wcc.read.timeout is the read time-out interval, in seconds. Once Elasticsearch server is connected to the WebCenter Content server, this attribute specifies the amount of time allowed for the WebCenter Content server to respond in a given request. The default value is 30 seconds.
-
es.wcc.max.connection.attempts is the maximum number of connection attempts to access the WebCenter Content server. The default value is 3.
Configuring WebCenter Content for Search
This topic describes how to configure WebCenter Content for search.
Note:
The following topics are applicable only if WebCenter Content is configured.
Creating a Crawl User in WebCenter Content
This procedure describes how to create a new crawl user in WebCenter Content.
If you want users with the admin role to crawl, then use an admin user account as the crawl user. If you want non-admin users to crawl, then create a new crawl user.
- Log on to WebCenter Content as an Administrator.
- To create a role
sescrawlerrole
, do the following: - To create a user
sescrawler
, and assign thesescrawlerrole
role to the user, do the following: - On the WebCenter Content home page, expand Administration, then Admin Server. Select General Configuration and append the
sceCrawlerRole=sescrawlerrole
entry in the Additional Configuration Variables section. - Restart WebCenter Content.
Configuring the SESCrawlerExport
Component
Before you begin, verify that the SESCrawlerExport
component is enabled. If not, enable the component (see Enabling the WebCenterConfigure Component) and restart the WebCenter Content server.
SESCrawlerExport
component for admin and non-admin users:
Configuring WebCenter Portal for Search
To configure WebCenter Portal for search, you need to configure the connection between WebCenter Portal and Elasticsearch and grant the crawl application role to the crawl admin user. Finally, you have to configure the WebCenter Content crawl user in Elasticsearch.
Note:
Only one search connection can exist. Before running createSearchConnection
WLST command, ensure that you delete any existing search connection.
Synchronizing Users in WebCenter Portal
Before performing a portal full crawl, we recommend you to run the LDAP synchronization WLST command to ensure that all users are available in portal.
Configuring Search Crawlers
You can configure the following types of crawlers to index WebCenter Portal resources:
-
Portal Crawler: This uses the Portal crawl source to crawl certain objects, such as lists, page metadata, portals, and profiles.
-
Documents Crawler: This uses the Documents crawl source to crawl documents, including wikis and blogs.
-
Discussions Crawler: This uses the Discussions crawl source to crawl discussion forums and announcements. This option is available only for portals upgraded from prior releases that include Discussions.
The following topics describe how to create different crawl sources using Scheduler UI in WebCenter Portal Administration:
Creating a Portal Crawl Source
Creating a Documents Crawl Source
Taking a Snapshot of the Content
The snapshot generates a configFile.xml
file at the location specified by the SESCrawlerExport component FeedLoc parameter. XML feeds are created in the subdirectory with the source name; for example, wikis. Performing a snapshot can take some time depending on the number of items you have stored on the Content Server instance and how many sources you are generating.
Note:
It is important to take a snapshot before the first crawl or any subsequent full crawl of the source.
Modifying Elasticsearch Global Attributes
WebCenter Portal uses Elasticsearch to index and search the objects. The attributes wcESConnectionTimeoutPeriod
and wcESReadTimeoutPeriod
are used to configure the interaction between WebCenter Portal and Elasticsearch. The wcESDocumentsCrawlerThreads
attribute is used to configure the number of threads required to process the crawling of documents.
The following are the attributes:
-
wcESConnectionTimeoutPeriod is the connection timeout interval, in seconds. This is the amount of time WebCenter Portal will wait to establish the connection to the Elasticsearch server. The default value is 30 seconds.
-
wcESReadTimeoutPeriod is the read timeout interval, in seconds. Once WebCenter Portal is connected to the Elasticsearch server, this specifies the amount of time allowed for the Elasticsearch server to respond in a given request. The default value is 30 seconds.
-
wcESDocumentsCrawlerThreads: The tasks for crawling the documents for search are handled in threads. This is done by creating a thread pool with a fixed number of threads, where each thread handles the crawl for the documents. The attribute
wcESDocumentsCrawlerThreads
can be used to specify the number of threads used to create a thread pool. The default value is 10. If a thread is not available for a crawl task, the task is in queue, waiting for other task to complete.
You can modify the default value of the attributes in Attributes page in WebCenter Portal administration. After you modify the value, you must restart the WebCenter Portal server for the changes to take effect.
Configuring Search Custom Attributes for Elasticsearch
When you search using WebCenter Portal, only certain predefined attributes show up in the search results. WebCenter Portal allows you to see additional attributes in your search results. This can be achieved from the Search Setting page in portal administration, where the Custom Attributes section lets you select which custom search attributes should appear in search results and the order in which they appear. This list in the Search Setting page is driven by search-service-attributes.xml
. It contains list of all attributes that we crawl for each service. Types in elastic search index is defined by this metadata. You can add a new custom attribute or modify the existing one in the search-service-attributes.xml
file.
The following procedure describes how to add a new search custom attribute using Document service as an example.
Scheduling a Crawl
You can schedule an incremental search crawl or manually start a full crawl. The topics in this section describe how to schedule a crawl and how to start, enable, or disable a crawl.
Scheduling an Incremental Crawl
Enabling and Disabling a Scheduled Crawl
Customizing Search Settings in WebCenter Portal Administration
You can customize Result Types and Filtering, Search Scope, Facets, and Custom Attributes on the Search Settings page in WebCenter Portal Administration. Portal managers can reset only the search scope for the portals that they manage.
To customize search settings for Elasticsearch: