Using DirectTransfer

DirectTransfer is a new PeopleSoft technology used to transmit search indexing data directly from PeopleSoft batch servers to Elasticsearch. This technology sends the data directly without having the overhead of passing the data through the Integration Broker Gateway and Web server.

Note: The DirectTransfer technology is specific to Elasticsearch; it cannot be used with SES.

PeopleSoft Search Framework implements DirectTransfer technology to transfer search data (search documents with attachments and search documents without attachments) directly to Elasticsearch bypassing the Integration Gateway. This feature is called Full Direct Transfer, which is the default option in PeopleSoft Search Framework. However, PeopleSoft enables you to choose whether you want to use the Full Direct Transfer option or not. If you choose not to use the Full Direct Transfer option, then search documents without attachments will be transmitted through the Integration Gateway but, the search documents with attachments will be transmitted directly to the Elasticsearch search engine.

DirectTransfer technology supports search definitions that are based on Connected Query and Query.

Image: DirectTransfer architecture

The following graphic depicts PeopleSoft downloading search documents from database and attachment repository, and then pushing encoded data to Elasticsearch using DirectTransfer. Elasticsearch uses Mapper attachment plug-in to parse attachment contents for indexing.

DirectTransfer architecture

To use DirectTransfer, you need to:

  • Complete the configurations required for DirectTransfer.

  • Review the system considerations.

  • Review error handling.

DirectTransfer requires the following configurations:

  • Specify the number of attachment handlers on the Search Options page.

  • Specify the maximum attachment error count on the Search Options page.

  • Specify whether you want to use the Full Direct Transfer option.

Specifying the Number of Attachment Handlers Setting

You use the Search Options page to enter the number of attachment handlers.

The number of Attachment Handlers determine how many parallel data transfers are done from PeopleSoft to Elasticsearch. The default value is 20, which means that during indexing a maximum of 20 handlers are created. This number should be lesser than the bulk thread queue size on Elasticsearch (default value of bulk thread queue size is 50).

For search definitions where an average attachment size is 100 KB and the bulk thread queue size on Elasticsearch is 50 (default value), then setting the value for Attachment Handlers to 20 would be optimal in most cases, subject to system considerations.

For indexing search definitions containing large attachments (for example, 10 MB or greater than 10 MB), then the value of Attachment Handlers can be reduced to 10.

The optimal value of Attachment Handlers is dependent on the data volume, pattern and system considerations on PeopleSoft and Elasticsearch.

Note: The Number of Attachment Handlers setting is used for search documents with attachments and also for search documents without attachments (in the case when Full Direct Transfer is enabled).

Specifying the Max Attachment Error Count Setting

You use the Search Options page to enter a value for the Max Attachment Error Count setting.

The Max Attachment Error Count setting is used to determine the error tolerance of the indexing program. This setting is used to specify the maximum error transactions permitted during indexing. If during indexing, the number of errors exceeds the specified value, the indexing process will exit after completing the process of sending data that is already available in memory.

See Error Handling.

Specifying the Full Direct Transfer Setting

You use the Search Options page to select Full Direct Transfer setting. The default value is Y.

If you set the Full Direct Transfer setting to Y (yes), then search documents with or without attachments are directly transmitted to the Elasticsearch search engine.

If you set the Full Direct Transfer setting to N (no), then search documents without attachments are transmitted through the Integration Gateway while the search documents with attachments are directly transmitted to the Elasticsearch search engine.

Review the following system considerations for DirectTransfer:

  • Memory usage on PeopleSoft

  • Memory usage on Elasticsearch

Memory Usage on PeopleSoft

The amount of memory used by DirectTransfer for runtime data storage is dependent on the segment size and number of attachment handlers. For example, for the default segment size of 10MB and 20 attachment handlers, DirectTransfer would utilize an additional 200 MB on the PeopleSoft Batch Server during data send.

Memory Usage on Elasticsearch

When DirectTransfer sends data to Elasticsearch, review the main memory considerations on Elasticsearch. This memory is allocated on the Elasticsearch JVM, and should be configured using the ES_HEAP_SIZE environment variable, which you set during the installation of Elasticsearch.

  • The incoming data is stored in bulk thread queues on Elasticsearch during the ingestion process. The amount of memory used for this purpose is based on the bulk thread queue size.

  • The amount of parallel ingestions which can happen on Elasticsearch is based on the cores on the system. The number of parallel bulk ingestions is equal to the number of cores.

  • During ingestion, for documents containing attachments, Elasticsearch uses a large amount of memory for document parsing. An example is parse a large attachment of 10 MB size, the memory required for parsing it is around 100 MB.

Examples

  1. To index a search definition where the average attachment size is 100 KB and Elasticsearch server is a 2 core system and memory available for Elasticsearch JVM is greater than or equal to 8 GB, then an Attachment Handler value of 20 should suit in most circumstances.

  2. To index a search definition where average attachment size is 1 MB and Elasticsearch server is a 4 core system and memory available for Elasticsearch JVM is greater than or equal to 16 GB, then an Attachment Handler value of 10 should suit in most circumstances.

  3. To index a search definition where average attachment size is 100 MB and Elasticsearch server is a 4 core system and memory available for Elasticsearch JVM is greater than or equal to 16 GB, then an Attachment Handler value of 5 should suit in most circumstances. The http.max_content_length setting on the elasticsearch.yml configuration file should also be increased in this scenario. The default value is 100 MB, but for large attachments you may set the value to a higher value, for example, http.max_content_length=512mb

To log error messages and to handle errors in DirectTransfer, PeopleSoft Search Framework provides the following:

  • On the Search Options page, Search Framework provides the Enable Attachment Trace setting to enable the detailed logging of each task on the trace file, which logs messages for search documents with attachments and search without attachments. For DirectTransfer, you should set this property to Y if you want to troubleshoot any errors associated with the transfer of search data from PeopleSoft to Elasticsearch.

    For more information, see Managing General Search Options.

  • On the Search Options page, the Max Attachment Error Count setting is used to determine the error tolerance of the indexing program. For more information on this setting, see Configuration.

  • On the Build Search Index page, in the Previous schedule details section, an Error link is displayed if an error occurs while indexing (PeopleTools, Search Framework, Search Admin Activity Guide, under Administration, Schedule Search Index).

    • Full Direct Transfer is enabled — During indexing, there could be possibilities that certain data transactions may not be successful either due to temporary system situations (for example, Elasticsearch is performing some internal maintenance operations) or due to data issues. In such cases, DirectTransfer stores this data into an error table, and proceeds with further indexing.

      After the indexing process is complete, errors, if any, can be viewed on the Build Index page (Previous schedule details section).

    • Full Direct Transfer is disabled — During the indexing of search documents with attachments (which are transmitted directly to Elasticsearch), if any error occurs, an Error link is displayed on the Build Search Index page.

      During the indexing of search documents without attachments (which are transmitted through the Integration Gateway), if any error occurs in IB transactions, an Error link may not be displayed on the Build Search Index page. You need to check the Process Monitor whether the PTSF_GENFEED Application Engine program displays No Success status. If the status of the PTSF_GENFEED Application Engine program is No Success, administrators must review the IB monitor logs for details on the error.

  • Based on the type of error, an administrator can use the Rerun option to resubmit the transactions if the issue was related to temporary environment problems, or can review the data and correct the data at the source, or can determine whether the data that is causing the error can be omitted for indexing. If the data needs to be corrected at source, then the indexing process should be executed again to get the correct data indexed.

Click the Error link to display the Attachment Transfer Exception Details page.

Image: Attachment Transfer Exception Details page

This example illustrates the fields and controls on the Attachment Transfer Exception Details page. You can find definitions for the fields and controls later on this page.

Attachment Transfer Exception Details page

Field or Control

Definition

Process Instance

The process instance ID of the PTSF_GENFEED Application Engine program that is causing the exception.

Request

The request sent by the PTSF_GENFEED Application Engine program to Elasticsearch.

Response

The response received by the PTSF_GENFEED Application Engine program from Elasticsearch.

Rerun

Use the Rerun button to resubmit the selected error transactions to Elasticsearch. The progress of the transaction can be monitored using the Process Monitor page on PIA. When a transaction is successful, it is deleted from the error pages.

In the case where the maximum exceptions limit is not reached, when you select the Rerun button:

  • The transaction is directly submitted to Elasticsearch through the PTSF_GENFEED Application Engine program, but without executing the Connected Query program.

  • The request messages stored in error tables will be sent directly to Elasticsearch.

In the case where the maximum exceptions limit is reached, the Rerun button acts as a re-submit of the PTSF_GENFEED Application Engine program.

If after re-running the PTSF_GENFEED Application Engine program, the transaction is not successful, then on the error tables the same entry is maintained with the exception information updated in the error log if the exception is different from the previous attempt.

For DirectTransfer to work with SSL enabled Elasticsearch, the Certification Authority (CA) certificate should be available in the Digital Certificate repository of PeopleSoft. This is to ensure that when the DirectTransfer process is run, the root certificates are obtained from the repository to create a CA certificate bundle and use it for secured communication between the Process Scheduler server and Elasticsearch.

To make the CA certificate available in the Digital Certificate repository of PeopleSoft, add the CA certificate of Elasticsearch to PeopleTools > Security > Security Objects > Digital Certificates.

Image: Digital Certificates page

This example illustrates the Digital Certificates page.

Digital Certificates page

When the process starts, the certificate bundle in PEM format is available in the Process Scheduler server folder.

Image: Certificate bundle in PEM format

This example illustrates the certificate bundle in PEM format.

Certificate bundle in PEM format