25 Configuring the Lucene Search Engine

Lucene is a third-party search engine that is integrated with WebCenter Sites. Lucene powers the search feature in WebCenter Sites' Contributor interface and enables the Public Site Search API to support search capabilities on websites. Before you can search for assets in the Contributor interface, you must set up and configure the Lucene search engine. This chapter shows you how to set up and maintain Lucene on your system.

This chapter contains the following sections:

25.1 Overview

When you install WebCenter Sites, the search feature in the WebCenter Sites database and Lucene are available. Lucene is enabled and configured out of the box. The Lucene engine is set up as Sites is installed, allowing content contributors, website visitors, and third party applications will be able to search for assets.

This chapter includes information detailing how to use the Lucene search engine, how to make additional assets searchable and how to pause or disable the search engine.

This section contains the following topics:

25.1.1 Indexing for Search Functions

Contributor searches are run against a search index that is powered by Lucene, not WebCenter Sites' database. A search index is built by an automated process called indexing, which collects, parses, and stores asset data in a format that can be quickly retrieved during a search query.

Search results are returned based solely on the data that is available in the index at the time the search is performed. The more assets you include in the index, the longer it takes to build and to search.

You select the types of assets to index on the search configuration forms. Selected assets will be indexed and therefore, searchable. Asset types you omitted from indexing will not be indexed and, therefore, will not be searchable.

Once the index is built, the Lucene search engine runs an event every thirty seconds that checks for changes made to assets of the types selected for indexing. If changes were made (such as creating a new index item, editing an existing entry, or deleting an entry) Lucene updates the index automatically. By default, index data is stored in the <cs_shared_dir>/lucene directory (where <cs_shared_dir> is the Sites shared file system directory).

25.1.2 WebCenter Sites Search Functions

WebCenter Sites includes the following search functions: Global search, Asset Type search, and the most specific of the searches, Configure Attributes for Asset Type Index. This search option is a subset of the Asset Type Index option, as it allows you to specify attributes that will be searchable for the indexing-enabled asset types.

You can enable all of the searches on your system. The searches differ in how they store the indexed user-defined attributes.

Global Search indexes system-defined attributes individually, allowing users to search by specific attributes. All of an asset's user-defined attribute values are stored together in one table cell; attribute names are omitted. This means, when Global search is configured, users are restricted to searching across all user-defined attributes per asset type.

For example, suppose you have an article asset. You could search for the string "Jane Doe." However, you could not limit your search to just one specific user-defined attribute, because all of the user-defined attribute data are stored together in a single cell that does not differentiate one attribute from another.

Asset Type Search indexes each attribute value, for both system-defined and user-defined attributes, in its own individual cell, by attribute name. Asset type searches are used for the Public Site Search API, which enables search capabilities on the website. For more information on public site search, see the Oracle Fusion Middleware WebCenter Sites Developer's Guide.

Using the example above of an article asset, with Asset Type search enabled you could, as with Global search, look for the string "Jane Doe" in the system-defined attribute Name. You could also search for the string in the user-defined attribute Byline, or in any other attribute for that asset type. Since user-defined attributes are now stored under their attribute names, you can search by those specific attributes.

Asset type search, therefore, has a level of specificity to its search results that Global search does not. In addition to the obvious benefits of this targeted search capability, an asset type search can return search results more quickly than a Global search. By limiting searches to only the relevant attributes, Asset Type search can eliminate the need for a search to run across unnecessary index data.

Configure Attributes for Asset Type Search enables you to limit a search to only the attributes that you specify for the asset types that are enabled by Asset Type Search. Using the example above of an article asset, if you have configured indexing for specific attributes, such as Headline and Byline, users can search for the string "Jane Doe" in only those attributes.

The tables illustrate the differences in Lucene-based searches and their levels of granularity. Each table represents an index for the same article type asset, but for a different type of search function. Across the tables, only the system-defined attribute data is stored in the same way.

Note:

The following tables illustrate the granularity of global searches, asset type searches, and attribute-specific searches. They are not meant to indicate how the search engine actually stores indexed data.

Figure 25-1 Search Function Tables and Differences

Description of Figure 25-1 follows
Description of ''Figure 25-1 Search Function Tables and Differences''

Besides the way they index data, the searches enable different functions, as summarized below:

Global Index Asset Type Index
Enables searches across all selected asset types. Enables searches per asset type (multiple asset types can be enabled).
Creates one index for all selected asset types. Creates one index per asset type.
Supports searching by system-defined attributes. Supports searching by system-defined and custom attributes.
System-defined attributes can be filtered (by using Configuring Attributes for Asset Type Index). System-defined and custom defined attributes can be filtered (by using Configuring Attributes for Asset Type Index).
Supports public searches on live site. Supports public searches on live site.

For more information about the Public Site Search API, see the Oracle Fusion Middleware WebCenter Sites Developer's Guide.

25.2 Setting Up Search Indices

The steps for setting up Lucene are as follows:

  1. Enabling the Lucene Search Engine. Before you can configure Lucene, you must enable it on your system.

  2. Adding Asset Types to the Search Index. This indicates to Lucene which assets it should index. You can add asset types to the global index and the asset type index.

    Once you have selected which assets to index and enabled binary file indexing (if desired), you can start the indexing process.

    During indexing, Lucene examines the contents of assets of the selected asset types (and the binary files the assets reference, if applicable) and creates entries for those assets in the index. A global index creates one index for all the selected asset types, while an asset type index creates one index per asset type. Once the global index has been created and users conduct a search, assets will be returned by the search feature in WebCenter Sites' Contributor interface or on the live site. Once the asset type index is created, and users conduct a search, items will be returned by the search feature on the live site.

  3. Configuring Attributes for Asset Type Index. Once you have created an index, you can select specific attributes to search on. Once an index for these specified attributes is created and populated, and you conduct a search, the specified attributes will be returned by the search feature on the live site.

  4. Enabling Indexing of Binary Files. If one or more asset types which you added to the index are set up to reference binary files, you can configure Lucene to convert the contents of those files to text when indexing assets that reference them.

This section contains the following topics:

25.2.1 Enabling the Lucene Search Engine

This section shows you how to enable the Lucene engine.

To start the Lucene Engine

  1. In the Admin tab, expand Search and double-click Start/Stop Search Engine Indices.

  2. Click Start Search Engine.

    Figure 25-2 Enable Indexing Form

    Description of Figure 25-2 follows
    Description of ''Figure 25-2 Enable Indexing Form''

Lucene is now enabled to start indexing selected data. The time it takes to index data varies with the number of assets being indexed and the speed of your system.

Once the Lucene Search Engine is started, it will continue to run until it is disabled. While indexing is running, changes to selected asset types are detected and the index is updated. The status of the asset type is listed as Enabled, while the index is running. If you wish to remove search capability, in addition to stopping indexing, you will also need to delete index data. See Section 25.4.3, "Deleting Index Data."

25.2.2 Adding Asset Types to the Search Index

This section shows you how to add asset types to the Global Search Index and the Asset Type index. Once each initial index has been created, Lucene checks for changes every 30 seconds. By default, index data is stored in the <cs_shared_dir>/lucene directory (where <cs_shared_dir> is the Sites shared file system directory). Once the data is added, it will be maintained until indexing is stopped entirely or paused for a selected asset type Assets of the selected types will not be returned by the search feature in WebCenter Sites' Contributor interface or on the live site until Lucene has indexed them.

To add new asset types to the search index

  1. Enable the Lucene engine:

    1. In the Admin tab, expand Search and double-click Start/Stop Search Engine Indices.

    2. Click Start Search Engine.

  2. Add asset types to the global search index:

    1. Double-click Configure Global Search.

    2. In the For index: drop-down list, select Add. WebCenter Sites displays a list of asset types that are not currently being indexed.

      Figure 25-3 Configure Global Search Form

      Description of Figure 25-3 follows
      Description of ''Figure 25-3 Configure Global Search Form''

    3. In the Asset Types list, select the asset types you want to index.

    4. Click OK.

    5. In the confirmation pop-up dialog, click OK. The asset type status changes to Enabled and indexing is enabled for the selected asset type. The index is created for that asset type as soon as the first asset of that type is created.

  3. Add asset types to the asset type search index.

    1. Double-click Configure Asset Type Search.

    2. In the For index: drop-down list, select Add. WebCenter Sites displays a list of asset types that are not currently being indexed.

      Figure 25-4 Configure Asset Type Search Form

      Description of Figure 25-4 follows
      Description of ''Figure 25-4 Configure Asset Type Search Form''

    3. In the list, select the asset types you want to index.

    4. Click OK. In the confirmation pop-up dialog that appears, click OK.

      The asset type status changes to Enabled and indexing is enabled for the selected asset type. The index is created for that asset type as soon as the first asset of that type is created.

  4. Enable binary file indexing, if desired (click Start Binary Indexing). For more information on binary file indexing, see Section 25.2.4, "Binary File Indexing."

    Note:

    Global search indexing creates one index for all asset types. Asset type index creates one index for each individual asset type.

25.2.3 Configuring Attributes for Asset Type Index

You can configure indexing on specific attributes for specific asset types. The selected asset type needs to be enabled for indexing first before you can select any specific attributes for the asset type. Once you have enabled Lucene and conducted a search on the live site, assets with indexed attribute data matching the search terms will be returned.

To configure attributes for a selected asset type

  1. If you have not already done so, add the asset type you want to configure to the asset type index. For instructions, see Section 25.2.2, "Adding Asset Types to the Search Index."

  2. In the Admin tab, expand Search and double-click Configure Attributes for Asset Type Index.

  3. In the Asset Type: drop-down list, select the asset type you want to configure. WebCenter Sites displays a list of attributes for the selected asset type.

    Figure 25-5 Configure Assets for Asset Type Index Form

    Description of Figure 25-5 follows
    Description of ''Figure 25-5 Configure Assets for Asset Type Index Form''

  4. The following information is also displayed:

    • Enabled: Indicates if the specified attribute is enabled for indexing for that specific asset type. To disable an attribute, deselect its checkbox.

    • Type: Indicates the type of data the attribute stores. For example, "Numeric" can indicate that price information is stored. You can also have "Text" and "DateTime" data.

    • Tokenized: Select True if you want this data to be converted to text before it is indexed. Data that is not tokenized is indexed as a single word and may not be interpreted by the reader.

    • Stored: Indicates if the original entire text of the specified attribute has been added to the index. Select True to store the entire text to the index.

      Note:

      At least one attribute must be enabled for each listed asset type.
  5. Once you have made your changes to the selected asset types, click Save.

25.2.4 Binary File Indexing

Binary files are files of type other than text, such as Word and PDF documents. You may choose to not enable this option if your assets do not reference binary files, or if the files they reference contain content that is not indexable, such as images and videos.

This section contains the following topics:

25.2.4.1 Enabling Indexing of Binary Files

If one or more asset types which you added to the indexing queue are set up to reference binary files stored in the WebCenter Sites file system, you can configure Lucene to convert the contents of those files to text when indexing the assets that reference them. (By default, Lucene is set up to ignore all binary files referenced by assets being indexed.)

To enable binary file indexing

  1. If you have not already done so, enable the Lucene engine:

    1. In the Admin tab, expand Search and double-click Start/Stop Search Engine Indices.

    2. Click Start Search Engine.

    The button name changes to Stop Search Engine.

  2. In the Admin tab, expand Search.

  3. Do one of the following:

    • To enable binary file indexing for Global search, double-click Configure Global Search.

    • To enable binary file indexing for Asset Type search, double-click Configure Asset Type Search.

  4. Click Start Binary Indexing.

    Lucene will now convert to text all binary files that are referenced by the assets it indexes.

25.2.4.2 Disabling Indexing of Binary Files

If you decide that you no longer want Lucene to convert the contents of binary files referenced by assets it indexes, you can disable this feature to improve performance.

To disable binary file indexing

  1. If you have not already done so, enable the Lucene engine:

    1. In the Admin tab, expand Search and double-click Start/Stop Search Engine Indices.

    2. Click Start Search Engine.

    The button name changes to Stop Search Engine.

  2. In the Admin tab, expand Search.

  3. Do one of the following:

    • To disable binary file indexing for Global search, double-click Configure Global Search.

    • To disable binary file indexing for Asset Type search, double-click Configure Asset Type Search.

  4. Click End Binary Indexing.

    Lucene will now ignore all binary files referenced by the assets it indexes.

25.3 Disabling the Lucene Search Engine

You can stop the Lucene engine if you want to improve performance. Once the engine is stopped, you will no longer be able to add or delete assets, or pause indexing. You will also no longer be able to re-index assets.

To stop indexing

  1. In the Admin tab, expand Search and double-click Start/Stop Search Engine Indices.

  2. Click Stop Search Engine.

    The button name changes to Start Search Engine.

    Indexing with the Lucene engine is now disabled. The index data is preserved; a search on the Contributor interface or public site will return assets included during the last time the index was built. If you wish to delete the index data (and therefore remove search capability), see Section 25.4.3, "Deleting Index Data."

    Note:

    If you stop Global search indexing, the search index used for the Contributor interface search function and the public site will not be updated. Therefore, search results will not be accurate.
    • To remove search capability from Contributor, you will also need to delete all assets from the Global index. See Section 25.4.3, "Deleting Index Data" for instructions.

    • If you add or remove assets while indexing is stopped, you need to rebuild the index to create an accurate search index when you restart indexing. See Section 25.4.2, "Re-indexing" for information on rebuilding indexes.

25.4 Maintaining Search Indexes

Once you have set up Lucene, you may need to perform tasks such as temporarily suspending indexing in order to perform bulk operations on assets, re-indexing, deleting index data, or writing code to specifically query the search engine.

This section contains the following topics:

25.4.1 Pausing and Resuming Indexing

Pausing and stopping indexing are similar functions. When you pause indexing, you have the ability to pause indexing for selected asset types, whereas when you stop the search index you stop indexing on all assets.

When you add and delete large numbers of assets, you can speed up the process by temporarily pausing indexing on the assets of the type you are adding or deleting. To reflect these changes in your search index, you will then need to index all assets of the type that you added or deleted, using the re-indexing function.

25.4.1.1 Pausing Global and Asset Type Indexing

When indexing is enabled, every asset that is added or updated to the WebCenter Sites database is indexed after it is saved. Saving a large number of assets will proceed faster if you pause the indexing of assets of that type. For example, you can pause indexing when performing a bulk import of assets into the WebCenter Sites database. You can then resume indexing and re-index all assets of that type after the assets are added to the database, indexing all the new (and existing) assets at one time.

When you pause indexing for an asset type, Lucene does the following:

  • Stops indexing assets of the selected type.

  • Preserves the index data for assets of the selected type.

When indexing is paused, searches continue to return results against the existing index. However, changes to the database made after indexing is paused are not indexed. Therefore, search results will not reflect changes made to the database after indexing was paused.

To pause indexing

  1. If you have not already done so, enable the Lucene engine:

    1. In the Admin tab, expand Search and double-click Start/Stop Search Engine Indices.

    2. Click Start Search Engine.

    The button name changes to Stop Search Engine.

  2. In the Admin tab, expand Search.

  3. Do one of the following:

    • To pause indexing for Global search, double-click Configure Global Search.

    • To pause indexing for Asset Type search, double-click Configure Asset Type Search.

  4. In the For index: drop-down list, select Pause. WebCenter Sites displays the list of asset types for which you can pause indexing.

    Figure 25-6 Configure Global Search Form

    Description of Figure 25-6 follows
    Description of ''Figure 25-6 Configure Global Search Form''

  5. In the Asset Types list, select the asset types for which you want to pause indexing.

    Note:

    If no asset types are displayed when you select Pause from the drop-down list, stop here. Either indexing is already paused for all asset types, or no asset types have yet been selected for indexing.
  6. Click the OK button next to the drop-down list of operation selections.

  7. In the confirmation pop-up dialog that appears, click OK.

    The Lucene Search Engine pauses indexing on assets of the selected types and preserves their index data. The status of the asset type changes to Paused.

    Changes to the database after indexing was paused are not indexed; therefore, search results will not reflect changes made to the database after indexing pauses.

25.4.1.2 Resuming Global and Asset Type Indexing

After pausing or disabling indexing, you will need to re-index to ensure that all your asset data is in the index. Follow the steps below to restart indexing.

To resume indexing

  1. Restart indexing on the paused asset types. For instructions, see Section 25.2.2, "Adding Asset Types to the Search Index."

  2. If you added assets to the database while indexing was paused, you must re-index to ensure that new data is included in the index. Proceed to Section 25.4.2, "Re-indexing."

  3. If you deleted assets while indexing was paused, the regular indexing process will detect which assets were deleted and remove that data from the index. However, if a large number of assets were deleted, it may be faster to delete the entire index for assets of the type you deleted and then re-index.

25.4.2 Re-indexing

While indexing is paused or stopped, WebCenter Sites does not track the additional assets added to the database. Therefore, to search those assets, all assets of the type for which indexing was paused must be re-indexed.

The time it takes to re-index assets varies with the number of assets being indexed and your system configuration. Updated search results for assets of the selected types will be returned only after the Lucene search engine has indexed them.

To re-index assets

  1. If you have not already done so, enable the Lucene engine:

    1. In the Admin tab, expand Search and double-click Start/Stop Search Engine Indices.

    2. Click Start Search Engine.

  2. In the Admin tab, expand Search.

  3. Do one of the following:

    • To re-index assets for Global search, double-click Configure Global Search.

    • To re-index assets for Asset Type search, double-click Configure Asset Type Search.

  4. In the For index: drop-down list, select Re-index.

    WebCenter Sites displays the asset types currently selected for indexing.

    Note:

    If no asset types are displayed when you select Re-index from the drop-down list, stop here. No asset types are in the indexing queue or indexing has been paused for all asset types in the queue.

    Figure 25-7 Configure Global Search Form

    Description of Figure 25-7 follows
    Description of ''Figure 25-7 Configure Global Search Form''

  5. In the list, select the asset types whose index data you want to build (or rebuild).

  6. Click OK.

  7. In the confirmation pop-up dialog that appears, click OK.

    Indexing begins.

    The status of the selected asset types changes to Enabled.

    Updated search results for assets of the selected types will be returned only after the Lucene search engine has indexed them.

25.4.3 Deleting Index Data

If you no longer need to perform searches on assets of a particular type, search results will be returned more quickly if the unnecessary data is removed from the index.

You may also wish to delete indexes if you stopped indexing and then deleted a large number of assets. In this case, it may be faster to delete the relevant index data and then re-index the remaining assets than to allow the regular indexing process to run through its normal process to detect which assets have been deleted. More information on pausing and restarting indexing can be found in Section 25.4.1, "Pausing and Resuming Indexing."

When you delete index data, WebCenter Sites does the following:

  • Pauses indexing on assets of the selected asset types.

  • Deletes the index data for assets of the selected asset types.

After you perform the steps below, index data is no longer available for assets of the selected types. Search results will no longer return data from assets of the selected types.

To delete data from the index

  1. In the Admin tab, expand Search.

  2. Do one of the following:

    • To delete assets from Global search, double-click Configure Global Search.

    • To delete assets from Asset Type search, double-click Configure Asset Type Search.

  3. In the For index: drop-down list, select Delete.

  4. WebCenter Sites displays the asset types currently being indexed.

    Figure 25-8 Configure Global Search Form

    Description of Figure 25-8 follows
    Description of ''Figure 25-8 Configure Global Search Form''

    Note:

    If no asset types are displayed when you select Delete from the drop-down list, stop here. No assets are being indexed.
  5. In the list, select the asset types whose index data you want to delete.

  6. Click OK.

  7. In the confirmation dialog box that appears, click OK.

    WebCenter Sites pauses indexing on assets of the selected types and deletes their index data.

    In Configure Global Search, the status of the asset types changes to Paused. This status indicates that no new assets will be added to the existing index.

    In Configure Asset Type Search, the status of the asset types changes to Disabled. This status indicates that this asset is no longer eligible for indexing.

    The assets will no longer be returned by the search feature in WebCenter Sites' Contributor interface or the public site.

    To make the assets searchable again, you must add the asset types back to indexing. For instructions, see Section 25.2.2, "Adding Asset Types to the Search Index."

25.5 Writing Code that Queries the Search Index

The following sample code illustrates how to query the Lucene search engine index. This code is based on the assumption that the user wants to search against a particular site and a particular asset type, where the site is passed in as variable currentSite and type is passed in as assetType. The user would use the following code to write a query against the Global index. The Lucene search engine would return all the assets or the maxResults (if total is greater than maxResults) specified of the type that belongs to the specified site.

ICS ics = Factory.newCS();
IndexSourceConfig srcConfig = new IndexSourceConfigImpl(ics);
SearchEngineConfig engConfig = new SearchEngineConfigImpl(ics);
IndexSourceMetadata sourceMd = 
   srcConfig.getConfiguration("Global");
String engineName = sourceMd.getSearchEngineName();
SearchEngine eng = engConfig.getEngine(engineName);
String currentSite = (String) 
   props.get(SearchIndexFields.Global.SITEID); 
   QueryExpression siteExpr = new 
   QueryExpressionImpl(SearchIndexFields.Global.SITEID, 
   Operation.CONTAINS, currentSite);
siteExpr = siteExpr.or(SearchIndexFields.Global.SITEID, Operation.EQUALS, "0");
QueryExpression typeQ = new 
QueryExpressionImpl(SearchIndexFields.Global.ASSET_TYPE, Operation.EQUALS, assetType);
QueryExpression qe = typeQ.and(siteExpr);
qe.setMaxResults(maxResults);
SearchResult<ResultRow> res = 
   eng.search(Collections.singletonList("Global"), qe);