Executing Data Quality Group

You can execute a defined DQ Group Definitions along with the mapped Rules and validation checks in the Data Quality Group Summary window. This in turn creates a Batch in Operations module. You can also create and execute a DQ Group in the Batch Execution window of Operations module. When a Data Quality Group is executed for processing, the execution details can be viewed in View Data Quality Group Summary Log.

Note:

Ensure Allow Correction on DI Source checkbox is selected in the System Configuration> Configuration > Others tab if you want to do the Data Quality check and correction, simultaneously, through DCDQ Framework.
Note that the results of execution of Data Quality Rules are stored in the table DQ_RESULT_DETL_MASTER of respective METADOM Schema. During the OFSAAI installation ensure the Oracle database tablespace in which this table resides is configured to AUTOEXTEND ON. Otherwise, the DQ Rule executions might result in error due to insufficient storage space available (ORA-01653 - Unable to extend tablespace by 1024). To mitigate this error, ensure sufficient storage for the tablespace has been allocated. For a single check (DQ) on a row of data, the table DQ_RESULT_DETL_MASTER stores the results in 1 row. Thus, for 2 checks on a row, the table would store results in 2 rows and so on.
A provision to Run DQ Rules in a DQ Group in parallel is introduced. There are two parameters DQ_ENABLE_PARALLEL_EXEC and DQ_MAX_NO_OF_EXEC_THREADS added in the CONFIGURATION table. If DQ_ENABLE_PARALLEL_EXEC parameter is set to 'Y', DQ rules within the group are executed in parallel. DQ_MAX_NO_OF_EXEC_THREADS can be used to specify the number of rules which should be Run, simultaneously.
If DQ_ENABLE_PARALLEL_EXEC parameter is set to 'N' or is not present, rules within the group are executed sequentially.

Note:

'Fail if threshold breaches' flag will not be considered for parallel execution.
To execute a DQ Group in the Data Quality Group Summary window:
  1. From the Data Quality Group Summary window, select the checkbox adjacent to the required DQ Group Name.
  2. Click Run button from the Data Quality Groups tool bar. The Run button is disabled if you have selected multiple checkboxes.
    The Group Execution window is displayed.

    Figure 7-48 Group Execution window


    This image displays the Group Execution window.

  3. In the Batch details section, do the following:
    • Select the MIS Date using the Calendar. MIS Date is mandatory and refers to the date with which the data for the execution would be filtered. In case the specified MIS date is not present in the target table, execution completes with the message “No Records found” in View Log window.

    Note:

    If there is an As_Of_Date column in the table, it looks for As_Of_Date matching the specified MIS Date. The DQ Batch ID is auto populated and is not editable.
    • Specify the percentage of Threshold (%) limit in numeric value. This refers to the maximum percentage of records that can be rejected in a job. If the percentage of failed records exceeds the Rejection Threshold, the job will fail. If the field is left blank, the default value is set to 100%.
    • Specify the Additional Parameters as filtering criteria for execution in the pattern Key#Data type#Value; Key#Data type#Value; and so on.
    Here the Datatype of the value should be “V” for Varchar/Char, or “D” for Date with “MM/DD/YYYY” format, or “N” for numeric data. For example, if you want to filter some specific region codes, you can specify the Additional Parameters value as $REGION_CODE#V#US;$CREATION_DATE#D#07/06/1983;$ACCOUNT_BAL#N#10000.50. You can mouse-over the Question mark icon for more information.

    Note:

    In case the Additional Parameters are not specified, the default value is taken as NULL. Except the standard place holders $MISDATE and $RUNSKEY, all additional parameters for DQ execution should be mentioned in single quotes. For example, STG_EMPLOYEE.EMP_CODE = '$EMPCODE'.
    • Select Yes or No from the Fail if Threshold Breaches drop-down list. If Yes is selected, execution of the task fails if the threshold value is breached. If No is selected, execution of the task continues.
    • For executing DQ rules on Spark, specify ‘EXECUTION_VENUE=Spark’ in the Optional Parameters field. Before execution, you should have registered a cluster from DMT Configurations > Register Cluster window with the following details:
      • Name - Enter name of the Hive information domain.
      • Description - Enter a description for the cluster.
      • Livy Service URL- Enter the Livy Service URL used to connect to Spark from OFSAA.
  4. Click Execute.
    A confirmation message is displayed and the DQ Group is scheduled for execution.
    After the DQ Group is executed, you can view the details of the execution along with the log information in the View Log window.
    For more information, see Viewing Data Quality Group Summary Log.