XML Data Chunking

Use the HCM Extracts diagnostics that enable you to split high-volume XML data into multiple smaller chunks to reduce the time and memory used for processing large volumes of data. Use this XML data chunking feature of Business Intelligence Publisher (BIP) when you encounter an unexpected issue that disables a business-critical process while running reports for large volumes of data.

The XML data chunking process involves the following stages:

  1. Scheduling the Extracts data chunking process
  2. Using BIP to:
    1. Create a report based on the seeded data model to generate an output
    2. Schedule a job to deliver the output remotely
  3. Automating the SFTP delivery option to deliver the data chunks (Optional)

1. Scheduling Extracts data chunking process

To schedule a Data Chunking process to split XML data into smaller chunks:

  1. Click Navigator > Tools > Scheduled Processes.
  2. On the Scheduled Processes Overview page, click Schedule New Process.
  3. Leave the type as Job, search and select the Extracts Process Diagnostics Report process, and click OK.
  4. In the Process Details dialog box, under the Basic Options section:
    • Select Data Chunking as the scope for the chunking process.
    • Select the instance name as the scope value from which you want to chunk the data.
    • Enter a numerical value indicating the number of required chunks size (number of objects in one chunk).
    • Screen capture of the Extracts Process Diagnostics Report process for XML data chunking tab interface with the basic option parameters highlighted
  5. Define the schedule, output, and notifications for the process set, as you’d do for any scheduled process.
  6. Set any other required options and click Submit.
  7. Download the report and log and check the number of chunks generated. Generally, the number of chunks generated is based on the chunk size and total objects. (Total Chunks Generated = Total Objects/Chunk Size)

2. Using BIP to generate and deliver data outputs

To generate the BIP data output and deliver it:

  1. Click Navigator > Tools > Reports and Analytics.
  2. On the Reports and Analytics page, click Browse Catalog.
  3. On the BIP Home page, copy the report associated with the extract definition, and change globalReportsDataModel to DataChunkingDm data model available in the shared directory (Shared Folders/Human Capital Management/Extracts/Data Models/).
  4. Enter the name and description of the report and save it in the Custom directory. (Shared Folders/Custom/...)
  5. Click OK to open the report pane.
  6. Click View Report.
  7. Click Action > Schedule to schedule a report job.
  8. In the General tab, enter the values for Flow Instance Name and Chunk Number parameters.
  9. Execute the report for each generated chunk, starting from chunk 1. For example: If you have 100,000 objects to process and you are using a chunk size of 25000 objects, then 4 chunks will be generated. In order to generate a report for all objects, you will need to run the report for chunks 1 through 4 until all 4 chunks are processed.
    • Screen capture of the General tab of the Schedule Report Job interface with its parameters highlighted
  10. In the Output tab, select FTP as the destination type and click Add Destination
  11. Select the FTP Server from the servers list.
  12. Enter the remote directory folder path to deliver the data chunks.
  13. Enter a file name for the generated file.
    • Screen capture of the Output tab of the Schedule Report Job interface with its output destination options highlighted
  14. Click Submit to schedule the report job. 

3. Automating delivery option to deliver data chunks (Optional)

You can create a delivery option placeholder in the extract definition that needs data chunking. To add a delivery option to send the data chunks:

  1. Open the extract definition and click the Deliver tab.
  2. Add a new delivery option with the delivery type FTP.
  3. The delivery option will keep the same report and template names. Enter the remote directory path and SFTP server name.
    • Screen capture of the FTP Delivery Option basic and advanced information options highlighted
  4. Delete the Extract Delivery Mode additional details created for the newly added delivery option.
    • Screen capture of the Extract Delivery Mode option Additional Details highlighted
  5. Validate and save the extract definition.
  6. Follow the steps mentioned above to run the data chunking process and provide the delivery option name to trigger automatic delivery to an SFTP server.

                     

Steps to Enable

You don't need to do anything to enable this feature.

Tips And Considerations

  • The data tables used for data chunking:
  • hry_ext_transformation
  • Hry_ext_parse_dtls
  • Use the following query to know the total number of chunks generated:

select count(*) from hry_ext_parse_dtls where transformation_id = (select max(transformation_id) from hry_ext_transformation where processing_type = 'CHUNKING' and instance_name = :flow_instance_name)

Key Resources

For more information see: