Understanding Real-Time Indexing

PeopleSoft Search Framework uses data that is indexed in the search engine for search and analytics, so it is recommended that the data is indexed in real time to provide current search and analytics. This would eliminate stale data in the search engine indexes and ensure that discrepancies do not exist between the data in the PeopleSoft database and the indexed data in the search engine.

Any search definition with the source type of query or connected query can be configured for real-time indexing. Refer to the PeopleSoft Application Fundamentals documentation <for your product line> for a list of delivered search definitions for which real-time indexing can be configured.

Set-based processing refers to processing groups, or sets of rows, at one time rather than processing each row individually.

The real-time indexing process ensures that data is updated on the search server as soon as an application transaction is saved. Based on the volume of data under processing in the real-time indexing queue, the transaction update may appear to be real time or near real time.

Search Framework implements set based processing in real-time indexing as shown in the following diagram.

The following diagram illustrates the process flow of the real-time indexing process.

Real time indexing process flow

Real-time indexing uses database triggers as the initializing point for the communication to the search server. As the data is inserted, updated, or deleted, the database trigger associated with the application record inserts a row in the real-time indexing staging table. The staging table acts as an interim data holder by storing the keys of the transaction.

Note: When batch processing is enabled for a search definition, updated data is not pushed to the real-time indexing staging table for the search definition. For more information, see Turning Off Real-Time Indexing During Batch Processing.

A dedicated Process Scheduler process for real-time indexing polls the staging table at regular intervals for any data to process. You can configure the number of processes in the Process Scheduler configuration properties file. By default, only one process is initiated. You may scale up the number of processes by changing the maximum number of instances. However, the current design is limited to enable real-time indexing on one domain only. Enabling real-time indexing on more than one domain causes the same dataset to be picked by both processes resulting in duplicate processing.

Real-time indexing processes the transactions stored in the staging table as a set. The size of the set is a configurable parameter in the Search Options page, which can be specified as per the available resources in the Process Scheduler server. A set can contain just one transaction or it can contain the maximum number of transactions as specified in the Real Time Indexing Set Size property based on the available data at any point in time.

Real-time indexing again processes each set for a specific search definition and begins the data retrieval process for each search definition. After the data is collected and formatted to a JSON structure, it is transferred to the search server using Direct Transfer.

Note: When an application batch program updates a large volume of data in an insert or update action, you can expect delay in updating the data to the search server due to the relatively large volume of data as compared to the online transactions with low volume. Therefore, PeopleSoft recommends that you use the real-time indexing batch switch for search definitions with batch programs having large volume indexing.

In most of the search definitions enabled for real-time indexing, you may not need to schedule incremental indexing. However, indexes that have data with effective date changes may require the incremental indexing schedules for the actual data synchronization to take place. A change on the effective dated rows does not initiate a trigger, therefore such search definitions should be indexed using the existing indexing methods. While real-time indexing ensures data synchronization based on transaction updates, incremental indexing updates the indexed data (from prior indexing schedules) to the current date.

Note: You should periodically run the AE maintenance program (PTRTI_TRUNC) of real-time indexing to ensure that the real-time indexing staging tables are de-fragmented after running large volume batch update for better performance. For a description of the AE maintenance program, see Using an Application Engine Maintenance Program.