Oracle® Health Sciences Cohort Explorer Administrator's Guide Release 1.0 E24438-01 |
|
|
PDF · Mobi · ePub |
This chapter discusses interface logic and execution plans. This chapter contains the following topics:
In a typical ETL process, data warehouse must be deployed (full load) and subsequently recently created, changed or deleted records will be incrementally applied to the data warehouse (incremental).
There are two packages:
Full_Load_Execution_Plan for full load
Master_Execution_Plan for incremental load
The Full Load execution plan executes is designed to load data mart tables in a particular sequence with certain assumptions about the volume of data available expected in different tables. Following is the strategy to create sequence:
Based on referential integrity among different CDM tables, create a load order, which maximizes parallel execution to minimize total time of loading CDM data mart.
If the volume of data is extremely large, performing parallel execution may result in resource contention, which will dramatically degrade the overall performance of the system. Oracle recommends that you formulate a load strategy based on the volume of data. The strategy should try to minimize the execution time by maximizing parallel executions taking into account resource constraints.
To execute the initial load you must execute the following two scripts from the OHSCE user:
cohort_drop_indexes.sql – This script will drop indexes and should be run before initial data load.
cohort_create_indexes.sql – This script will recreates indexes and should be run after initial data load. This will prompt for the index tablespace parameter.
The Full Load execution plan runs ETLs in the following order:
Load user dimension.
Load all other dimensions in parallel and then load W_EHA_RESEARCH_PATIENT_F.
Load unspecified records in all _D tables.
Load all bridge tables in the following order:
LOAD_PATIENT_HISTORY_PATIENT_BRIDGE, LOAD_ETHNICITY_PATIENT_BRIDGE, and LOAD_SPECIMEN_PATIENT_BRIDGE in parallel.
LOAD_CONSENT_PATIENT_BRIDGE, LOAD_PROCEDURE_PATIENT_BRIDGE, and LOAD_RACE_PATIENT_BRIDGE in parallel
LOAD_DIAGNOSTIC_TEST_PATIENT_BRIDGE
LOAD_DX_PATIENT_BRIDGE
LOAD_SUBADMN_PATIENT_BRIDGE
FULL_LOAD_ALL_DHL_TABLES. This package executes %_DHL related ETLs in parallel.
Load unspecified records for three bridge tables.
Load protocol related tables.
To execute a full load, perform the following steps:
Navigate to Execution Plans > Initial Setup > Full_Load_Execution_Plan > Packages > Full_Load_Execution_Plan > Scenarios.
Right-click FULL_LOAD_EXECUTION_PLAN.
Select Execute.
The last successful ETL load time for a given interface will be used to filter new and changed records in HDWF. In case auxiliary or child tables (tables that are joined to the base source table and whose data will be populated into data mart tables) are used as the source, the UPDATE_DT of these tables will be considered as well. Following is the code snippet used in the filter condition.
UPDATE_DT > TO_DATE ('#LAST_EXTRACT_DATE','MM/DD/YYYY HH24: MI: SS')
where LAST_EXTRACT_DATE is the ODI variable name.
An ODI variable LAST_EXTRACT_DATE is used to retrieve the value of LAST_MAX_DT from the table and this variable is referred in the UPDATE_DT logic in the incremental load interface. Following is the code snippet used for this variable:
SELECT case when COUNT (*)>0 then TO_CHAR_FORMAT (MIN (LAST_MAX_DATE),'MM/DD/YYYY HH24: MI: SS') else '01/01/1900 00:00:00' end
FROM QUALIFY (C_LOAD_DATES)
WHERE PACKAGE_NAME='GET_SCEN_NAME ()'
The Incremental Load execution plan runs ETLs in the following order:
Load user dimension.
Load all other dimensions in parallel and then load W_EHA_RESEARCH_PATIENT_F.
Load unspecified records in all _D tables.
Load all bridge tables in parallel.
Load unspecified records for three bridge tables.
Load protocol related tables.
To execute an incremental load, perform the following steps:
Navigate to Execution Plans > Master Execution Plan > Packages > Master_Execution_Plan > Scenarios.
Right-click MASTER_EXECUTION_PLAN.
Select Execute.
In an ODI package associated with a data mart table, a decision is made whether to run the incremental load or initial load using the variable IS_INCREMENTAL, as shown in figure 3-1. The logic for this variable returns Y for incremental load and N for full load.
Once the load is completed successfully, a record will be inserted in C_LOAD_DATES job control table. Following is an example of such a record:
Table 3-1 Example of an Inserted Record
PACKAGE_NAME | TARGET_TABLE_NAME | ETL_PROC_WID | LAST_MAX_DT | INSERT_DT |
---|---|---|---|---|
LOAD_CONSENT_DIMENSION |
W_EHA_CONSENT_D |
14600 |
5/10/2011 17:02 |
5/10/2011 17:03 |
where,
PACKAGE_NAME is the package for loading the W_EHA_CONSENT_D table.
TARGET_TABLE_NAME is the target table name.
ETL_PROC_WID is the unique identifier created using odiRef function.
LAST_MAX_DT is the last_extract_date (current system timestamp -1second) of the source schema.
INSERT_DT is the date on which the record is inserted into this table.
Following is the code snippet used for the IS_INCREMENTAL variable:
SELECT CASE WHEN COUNT (*)>0 THEN 'Y' ELSE 'N' END
FROM QUALIFY (C_LOAD_DATES)
WHERE PACKAGE_NAME = 'GET_SCEN_NAME ()
Perform the following steps to execute an individual ETL:
For example, to execute ETL for Diagnosis dimension, perform the following steps:
Navigate to Mappings > Dimensions > Load_Diagnosis_Dimension > Packages > Load_Diagnosis_Dimension > Scenarios.
Right-click LOAD_DIAGNOSIS_DIMENSION.
Select Execute.
Scheduling an execution plan implies that its corresponding scenario needs to be scheduled. Perform the following steps to schedule a scenario:
Open a scenario.
Right-click Scheduling.
Select New Scheduling and provide different scheduling options as needed.
Click Save on the File main menu. The execution plan will be initiated accordingly.
For more information on scheduling an ETL execution plan refer to Oracle® Fusion Middleware Getting Started with Oracle Data Integrator 11g Release 1 (11.1.1)
Figure 3-4 Scheduling an ETL Execution Plan