How can I chunk high-volume XML data?

The HCM Extracts diagnostics enables you to split high-volume XML data into multiple smaller chunks to reduce the time and memory used for processing high-volume data.

Use the XML data chunking feature of Business Intelligence Publisher (BIP) when you face an issue that disables a business-critical process while running reports for large volumes of data.

The XML data chunking process involves the following stages:

  1. Schedule the Extracts data chunking process
  2. Create a BIP report based on the predefined data model to generate an output
  3. Schedule a BIP job to deliver the output remotely
  4. (Optional) Automate the Extracts SFTP delivery option to deliver the data chunks

Schedule Extracts data chunking process

  1. Click Navigator > Tools > Scheduled Processes.
  2. On the Scheduled Processes Overview page, click Schedule New Process.
  3. Leave the type as Job, search and select the Extracts Process Diagnostics Report process, and click OK.
  4. In the Process Details dialog box, under the Basic Options section:
    • Select Data Chunking as the scope for the chunking process.
    • Select the instance name as the scope value from which you want to chunk the data.
    • Enter a numeric value indicating the number of required chunk sizes (number of objects in one chunk).
  5. Define the schedule, output, and notifications for the process set, as you’d do for any scheduled process.
  6. Set any other required options and click Submit.
  7. Download the report and log and check the number of chunks generated. Generally, the number of chunks generated is based on the chunk size and total objects. (Total Chunks Generated = Total Objects/Chunk Size)
    Use the following query to know the total number of chunks generated:
    select count(*) from hry_ext_parse_dtls where transformation_id = (select max(transformation_id) from hry_ext_transformation where processing_type = 'CHUNKING' and instance_name = :flow_instance_name)
    The data tables used for data chunking:
    • hry_ext_transformation
    • Hry_ext_parse_dtls

Use BIP to generate and deliver data outputs

  1. Click Navigator > Tools > Reports and Analytics.
  2. On the Reports and Analytics page, click Browse Catalog.
  3. On the BIP Home page, copy the report associated with the extract definition, and change globalReportsDataModel to DataChunkingDm data model available in the shared directory (Shared Folders/Human Capital Management/Extracts/Data Models/).
  4. Enter the name and description of the report and save it in the Custom directory. (Shared Folders/Custom/...)
  5. Click OK to open the report pane.
  6. Click View Report.
  7. Click Action > Schedule to schedule a report job.
  8. In the General tab, enter the values for Flow Instance Name and Chunk Number parameters.
  9. Execute the report for each generated chunk, starting from chunk 1.

    For example: If you've 100,000 objects to process and you're using a chunk size of 25000 objects, then 4 chunks will be generated. To generate a report for all objects, you'll need to run the report for chunks 1 through 4 until all 4 chunks are processed.

  10. In the Output tab, select FTP as the destination type and click Add Destination.
  11. Select the FTP Server from the servers list.
  12. Enter the remote directory folder path to deliver the data chunks.
  13. Enter a file name for the generated file.
  14. Click Submit to schedule the report job.

(Optional) Automate delivery option to deliver data chunks

  1. Open the extract definition and click the Deliver tab.
  2. Add a new delivery option with the delivery type FTP.
  3. The delivery option will keep the same report and template names. Enter the remote directory path and SFTP server name.
  4. Delete the Extract Delivery Mode additional details created for the newly added delivery option.
  5. Validate and save the extract definition.
  6. Follow the steps mentioned above to run the data chunking process and provide the delivery option name to trigger automatic delivery to an SFTP server.