A.11.1 Data Slicing Utility Script
The Data Slicing Utility is a SQL script to perform data slicing (slicing the data into different chunks or data units) according to the user input (FIC_MIS_DATE). It helps faster turn-around time for individual batches as the load is moderately low.
FIC_MIS_DATE is the execution identifier for Entity Resolution, and it is easy to distribute records into different FIC_MIS_DATE values.
You can perform the data slicing for a high volume of data, which takes a
long time and more resource based on your database performance.
Note:
This utility is used for slicing the data in the following input tables of the out-of-the-box rules for Entity Resolution:- STG_PARTY_MASTER_PRE
- STG_PARTY_DETAILS_PRE
- STG_PARTY_EMAIL_MAP_PRE
- STG_PARTY_PHONE_MAP_PRE
- STG_CUSTOMER_IDENTIFCTN_DOC_PRE
- STG_PARTY_ADDRESS_MAP_PRE
- STG_ADDRESS_MASTER_PRE
The utility distributes the data into logical units based on the criteria
(user input), resulting in multiple data chunks.
- It accepts comma-separated FIC_MIS_DATE as user input.
For example. 20150101,20150102,20150103
- It distributes the records across the FIC_MIS_DATE equally.
The last slice should contain additional records if there are any.
Note:
It is recommended that you must split the data into slices of a maximum of 10 million records.Here is a scenario of data slicing:- Input data volume: 50 million
- Size of slice on which job has to execute: 10 million
- Total number of slices: 5 (different comma-separated FIC_MIS_DATE)
After the utility completes the distribution, you can perform the ER batch
execution as follows:
- Provide the chunk as Day 0 load corresponding to the respective FIC_MIS_DATE.
- Provide subsequent chunks such as Day 1, Day 2, etc. These chunks are treated as delta loads (delta having only new records).
To execute the utility script, perform the following:
- Obtain the script from path
<COMPLIANCE_STUDIO_INSTALLATION_PATH>/ficdb/ Utilities/DataSlicingUtility/DataSlicingUtility.sql
- Log in to the ER Schema. The schema (input tables of Entity Resolution) is available.
- Copy the script to the machine where you need to execute the script.
- Run the following command in SQL
prompt:
@<Fully Qualified path of Utility Script>/DataSlicingUtility.sql
- Enter the values according to the following
prompt:
Enter value for fic_mis_date:
You need to enter comma-separated FIC_MIS_DATE in YYYYMMDD format.
For example, 20150101,20150102,20150103
- Press Enter.
- On successful execution, the utility scripts exits with a
success message "FIC_MIS_DATEs have applied for all <list of
fic_mis_dates> slices"
For example,
SQL> @<path of the script>/DataSlicingUtility.sql Enter value for fic_mis_date: 20150107,20150108,20150109,20150110,20150115 old 24: FIC_MIS_DATES:='&FIC_MIS_DATE'; new 24: FIC_MIS_DATES:='20150107,20150108,20150109,20150110,20150115'; PL/SQL procedure successfully completed.
- On failure, displays the appropriate error message.
- On successful execution, the utility scripts exits with a
success message "FIC_MIS_DATEs have applied for all <list of
fic_mis_dates> slices"
- You can validate the results of successful execution:
- For each input table, check the count of records against
FIC_MIS_DATE.
Run the following commands to check the count in each input table. Perform the same for all input tables:
SELECT DISTINCT FIC_MIS_DATE, COUNT(*) FROM <INPUT TABLE NAME> GROUP BY FIC_MIS_DATE;
For example,SELECT DISTINCT FIC_MIS_DATE, COUNT(*) FROM STG_PARTY_MASTER_PRE GROUP BY FIC_MIS_DATE;
- Ensure that complete information for a particular party is included in the same slice. For example, for any V_PARTY_ID, there should be the same FIC_MIS_DATE tagged in each input table.
- For each input table, check the count of records against
FIC_MIS_DATE.