A.11.1 Data Slicing Utility Script

The Data Slicing Utility is a SQL script to perform data slicing (slicing the data into different chunks or data units) according to the user input (FIC_MIS_DATE). It helps faster turn-around time for individual batches as the load is moderately low.

FIC_MIS_DATE is the execution identifier for Entity Resolution, and it is easy to distribute records into different FIC_MIS_DATE values.

You can perform the data slicing for a high volume of data, which takes a long time and more resource based on your database performance.

Note:

This utility is used for slicing the data in the following input tables of the out-of-the-box rules for Entity Resolution:
  • STG_PARTY_MASTER_PRE
  • STG_PARTY_DETAILS_PRE
  • STG_PARTY_EMAIL_MAP_PRE
  • STG_PARTY_PHONE_MAP_PRE
  • STG_CUSTOMER_IDENTIFCTN_DOC_PRE
  • STG_PARTY_ADDRESS_MAP_PRE
  • STG_ADDRESS_MASTER_PRE
The utility distributes the data into logical units based on the criteria (user input), resulting in multiple data chunks.
  • It accepts comma-separated FIC_MIS_DATE as user input.

    For example. 20150101,20150102,20150103

  • It distributes the records across the FIC_MIS_DATE equally. The last slice should contain additional records if there are any.

    Note:

    It is recommended that you must split the data into slices of a maximum of 10 million records.
    Here is a scenario of data slicing:
    • Input data volume: 50 million
    • Size of slice on which job has to execute: 10 million
    • Total number of slices: 5 (different comma-separated FIC_MIS_DATE)
After the utility completes the distribution, you can perform the ER batch execution as follows:
  1. Provide the chunk as Day 0 load corresponding to the respective FIC_MIS_DATE.
  2. Provide subsequent chunks such as Day 1, Day 2, etc. These chunks are treated as delta loads (delta having only new records).
To execute the utility script, perform the following:
  1. Obtain the script from path <COMPLIANCE_STUDIO_INSTALLATION_PATH>/ficdb/ Utilities/DataSlicingUtility/DataSlicingUtility.sql
  2. Log in to the ER Schema. The schema (input tables of Entity Resolution) is available.
  3. Copy the script to the machine where you need to execute the script.
  4. Run the following command in SQL prompt:
    @<Fully Qualified path of Utility Script>/DataSlicingUtility.sql
  5. Enter the values according to the following prompt:
    Enter value for fic_mis_date:

    You need to enter comma-separated FIC_MIS_DATE in YYYYMMDD format.

    For example, 20150101,20150102,20150103

  6. Press Enter.
    • On successful execution, the utility scripts exits with a success message "FIC_MIS_DATEs have applied for all <list of fic_mis_dates> slices"
      For example,
      SQL> @<path of the script>/DataSlicingUtility.sql
      Enter value for fic_mis_date:
      20150107,20150108,20150109,20150110,20150115
      old 24: FIC_MIS_DATES:='&FIC_MIS_DATE';
      new 24:
      FIC_MIS_DATES:='20150107,20150108,20150109,20150110,20150115';
      PL/SQL procedure successfully completed.
    • On failure, displays the appropriate error message.
  7. You can validate the results of successful execution:
    • For each input table, check the count of records against FIC_MIS_DATE.
      Run the following commands to check the count in each input table. Perform the same for all input tables:
      SELECT DISTINCT FIC_MIS_DATE, COUNT(*) FROM <INPUT TABLE NAME> GROUP
      BY FIC_MIS_DATE;
      For example,
      SELECT DISTINCT FIC_MIS_DATE, COUNT(*) FROM STG_PARTY_MASTER_PRE
      GROUP BY FIC_MIS_DATE;
    • Ensure that complete information for a particular party is included in the same slice. For example, for any V_PARTY_ID, there should be the same FIC_MIS_DATE tagged in each input table.