3 Executing ETLs

This chapter discusses interface logic and execution plans. This chapter contains the following topics:

Executing Full Load
Executing Incremental Load
Executing an Individual ETL
Scheduling an ETL Execution Plan

In a typical ETL process, data warehouse must be deployed (full load) and subsequently recently created, changed or deleted records will be incrementally applied to the data warehouse (incremental).

There are two packages:

Full_Load_Execution_Plan for full load
Master_Execution_Plan for incremental load

The Full Load execution plan executes is designed to load data mart tables in a particular sequence with certain assumptions about the volume of data available expected in different tables. Following is the strategy to create sequence:

Based on referential integrity among different CDM tables, create a load order, which maximizes parallel execution to minimize total time of loading CDM data mart.

If the volume of data is extremely large, performing parallel execution may result in resource contention, which will dramatically degrade the overall performance of the system. Oracle recommends that you formulate a load strategy based on the volume of data. The strategy should try to minimize the execution time by maximizing parallel executions taking into account resource constraints.

Executing Full Load

To execute the initial load you must execute the following two scripts from the OHSCE user:

cohort_drop_indexes.sql – This script will drop indexes and should be run before initial data load.
cohort_create_indexes.sql – This script will recreates indexes and should be run after initial data load. This will prompt for the index tablespace parameter.

The Full Load execution plan runs ETLs in the following order:

Load user dimension.
Load all other dimensions in parallel and then load W_EHA_RESEARCH_PATIENT_F.
Load unspecified records in all _D tables.
Load all bridge tables in the following order:
1. LOAD_PATIENT_HISTORY_PATIENT_BRIDGE, LOAD_ETHNICITY_PATIENT_BRIDGE, and LOAD_SPECIMEN_PATIENT_BRIDGE in parallel.
2. LOAD_CONSENT_PATIENT_BRIDGE, LOAD_PROCEDURE_PATIENT_BRIDGE, and LOAD_RACE_PATIENT_BRIDGE in parallel
3. LOAD_DIAGNOSTIC_TEST_PATIENT_BRIDGE
4. LOAD_DX_PATIENT_BRIDGE
5. LOAD_SUBADMN_PATIENT_BRIDGE
6. FULL_LOAD_ALL_DHL_TABLES. This package executes %_DHL related ETLs in parallel.
Load unspecified records for three bridge tables.
Load protocol related tables.

To execute a full load, perform the following steps:

Navigate to Execution Plans > Initial Setup > Full_Load_Execution_Plan > Packages > Full_Load_Execution_Plan > Scenarios.
Right-click FULL_LOAD_EXECUTION_PLAN.
Select Execute.

Figure 3-1 Executing a Full Load

Description of "Figure 3-1 Executing a Full Load"

Executing Incremental Load

The last successful ETL load time for a given interface will be used to filter new and changed records in HDWF. In case auxiliary or child tables (tables that are joined to the base source table and whose data will be populated into data mart tables) are used as the source, the UPDATE_DT of these tables will be considered as well. Following is the code snippet used in the filter condition.

UPDATE_DT > TO_DATE ('#LAST_EXTRACT_DATE','MM/DD/YYYY HH24: MI: SS')

where LAST_EXTRACT_DATE is the ODI variable name.

An ODI variable LAST_EXTRACT_DATE is used to retrieve the value of LAST_MAX_DT from the table and this variable is referred in the UPDATE_DT logic in the incremental load interface. Following is the code snippet used for this variable:

SELECT case when COUNT (*)>0 then TO_CHAR_FORMAT (MIN (LAST_MAX_DATE),'MM/DD/YYYY HH24: MI: SS') else '01/01/1900 00:00:00' end

FROM QUALIFY (C_LOAD_DATES)

WHERE PACKAGE_NAME='GET_SCEN_NAME ()'

The Incremental Load execution plan runs ETLs in the following order:

Load user dimension.
Load all other dimensions in parallel and then load W_EHA_RESEARCH_PATIENT_F.
Load unspecified records in all _D tables.
Load all bridge tables in parallel.
Load unspecified records for three bridge tables.
Load protocol related tables.

To execute an incremental load, perform the following steps:

Navigate to Execution Plans > Master Execution Plan > Packages > Master_Execution_Plan > Scenarios.
Right-click MASTER_EXECUTION_PLAN.
Select Execute.

Figure 3-2

Description of "Figure 3-2 "

Executing an Individual ETL

In an ODI package associated with a data mart table, a decision is made whether to run the incremental load or initial load using the variable IS_INCREMENTAL, as shown in figure 3-1. The logic for this variable returns Y for incremental load and N for full load.

Once the load is completed successfully, a record will be inserted in C_LOAD_DATES job control table. Following is an example of such a record:

Table 3-1 Example of an Inserted Record

PACKAGE_NAME	TARGET_TABLE_NAME	ETL_PROC_WID	LAST_MAX_DT	INSERT_DT
LOAD_CONSENT_DIMENSION	W_EHA_CONSENT_D	14600	5/10/2011 17:02	5/10/2011 17:03

where,

PACKAGE_NAME is the package for loading the W_EHA_CONSENT_D table.
TARGET_TABLE_NAME is the target table name.
ETL_PROC_WID is the unique identifier created using odiRef function.
LAST_MAX_DT is the last_extract_date (current system timestamp -1second) of the source schema.
INSERT_DT is the date on which the record is inserted into this table.

Following is the code snippet used for the IS_INCREMENTAL variable:

SELECT CASE WHEN COUNT (*)>0 THEN 'Y' ELSE 'N' END

FROM QUALIFY (C_LOAD_DATES)

WHERE PACKAGE_NAME = 'GET_SCEN_NAME ()

Perform the following steps to execute an individual ETL:

For example, to execute ETL for Diagnosis dimension, perform the following steps:

Navigate to Mappings > Dimensions > Load_Diagnosis_Dimension > Packages > Load_Diagnosis_Dimension > Scenarios.
Right-click LOAD_DIAGNOSIS_DIMENSION.
Select Execute.

Figure 3-3 Executing an Individual ETL

Description of "Figure 3-3 Executing an Individual ETL"

Scheduling an ETL Execution Plan

Scheduling an execution plan implies that its corresponding scenario needs to be scheduled. Perform the following steps to schedule a scenario:

Open a scenario.
Right-click Scheduling.
Select New Scheduling and provide different scheduling options as needed.
Click Save on the File main menu. The execution plan will be initiated accordingly.

For more information on scheduling an ETL execution plan refer to Oracle® Fusion Middleware Getting Started with Oracle Data Integrator 11g Release 1 (11.1.1)

Figure 3-4 Scheduling an ETL Execution Plan

Description of "Figure 3-4 Scheduling an ETL Execution Plan"