Oracle® Clinical Development Analytics User and Administrator Guide Release 2.0.0.3 for Standard Configuration Part Number E22699-01 |
|
|
View PDF |
This chapter contains the following topics:
To load data from the source systems to the data warehouse, OCDA uses Extract Transform and Load (ETL) programs that
Identify and read desired data from different data source systems,
Clean and format data uniformly, and
Write it to the target data warehouse.
In OCDA, Oracle Clinical and Oracle's Siebel Clinical are the source systems for which Oracle provides predefined ETL.
Figure 5-1 displays the ETL process delivered with OCDA.
Set up as a recurring job in DAC, the Extraction, Transformation, and Load process (ETL) is designed to periodically capture targeted metrics (dimension and fact data) from multiple clinical trial databases, transform and organize them for efficient query, and populate the star-schema tables.
While the OCDA data model supports data extraction from multiple sources, OCDA only includes source-dependent extract (SDE) mappings for the Oracle Clinical and Siebel Clinical databases. However, you can also define SDE mappings from additional external sources that write to the appropriate staging tables. Note that you are responsible for resolving any duplicate records that may be created as a consequence. For more information about how to add a new data source to OCDA, refer to Adding a New Data Source.
The SDE programs map the transactional data to staging tables, in which the data must conform to a standardized format, effectively merging the data from multiple, disparate database sources. This is the architectural feature that accommodates external database sourcing.
The staged data is transformed using the source-independent loads (SILs) to star-schema tables, where such data are organized for efficient query by the Oracle BI Server.
There is one SDE mapping for each target table, which extracts data from the source system and loads it to the staging tables. SDEs have the following features:
Incremental submission mode: OCDA-supplied ETL uses timestamps and journal tables in the source transactional system to optimize periodic loads.
Bulk and normal load: Bulk load uses block transfers to expedite loading of large data volume. It is intended for use during initial data warehouse population. Bulk load is faster, if data volume is sufficiently large. However, if load is interrupted (for example, disk space is exhausted, power failure), load cannot be restarted in the middle; you must restart the load.
Normal load writes one record at a time. It is intended to be used for updates to the data warehouse, once population has been completed. Normal load is faster, if data volume is sufficiently small. You can also restart load if the load is interrupted.
Setting Bulk or Normal load option should be done at Workflow session in Informatica. Perform the following steps to set the load option:
Navigate to Session in a workflow and edit the task properties.
Navigate to the Mappings subtab and select 'Bulk/Normal' under Target Load type.
Save the workflow.
There is one SIL mapping for each target table. The SIL extracts the normalized data from the staging table and inserts it into the data warehouse star-schema target table. SILs have the following attributes:
Concerning changes to dimension values over time, OCDA overwrites old values with new ones. This strategy is termed as Slowly Changing Dimension approach 1.
OCDA's data model includes aggregate tables and a number of indexes, designed to minimize query time.
By default, bulk load is disabled for all SILs.
The results of each ETL execution is logged by Informatica. The logs hold information about errors encountered, during execution.
Informatica provides the following four error tables:
PMERR_DATA
PMERR_MSG
PMERR_SESS
PMERR_TRANS
During ETL execution, records which fail to be inserted in the target table (for example, some records violate a constraint) are placed in the Informatica PowerCenter error tables. You can review which records did not make it into the data warehouse, and decide on appropriate action with respect to them.
Adding Data Source Information
As you read data from different database instances, you need to specify the source of the data. OCDA provides the W_RXI_DATASOURCE_S table (in RXI schema) that stores all information about all data sources from which data is extracted for OCDA. The following are some of the columns in this table:
ROW_WID - A unique ID for each record in the table.
DATASOURCE_NUM_ID - The ID for the database. Must be coordinated with the value given to the database when ETL is run.
DATASOURCE_NAME - A meaningful name of the database.
DATASOURCE_TYPE - Application system that manages the database.
DESC_TEXT - Optional text describing the purpose of the database.
INTEGRATION_ID - Set this to the same values as DATASOURCE_NUM_ID
See Also:
Oracle Clinical Development Analytics Electronic Technical Reference Manual, for more information about the W_RXI_DATASOURCE_S table.
Adding a New Data Source, for more information about how to add a new data source to OCDA.
Handling Deletions in Siebel Clinical
OCDA provides an optional feature to manage hard deletion of records in Siebel Clinical. You create triggers in the source system to handle deletion of records. To do this:
Navigate to the temporary staging location where the OCDA installer copies the installation files.
Connect to the Siebel Clinical data source and run the ocda_sc_del_trigger.sql
script delivered with OCDA. This script creates the RXI_DELETE_LOG_S table and triggers on tables provided as input. The following are the tables in Siebel Clinical for which OCDA supports creating triggers:
S_CL_PTCL_LS
S_PROD_INT
S_CL_SUBJ_LS
S_CONTACT
S_CL_PGM_LS
S_PTCL_SITE_LS
S_EVT_ACT
Provide a list of comma separated values of table names for which the triggers need to be created as the script's input. For example, S_CL_PTCL_LS, S_PROD_INT, S_CL_SUBJ_LS. The tables names that you provide can only be a subset of the tables listed above.
Note that when the user deletes a record in the table, the primary key of the deleted record is inserted in the RXI_DELETE_LOG_S table on the Siebel source system.
Modify the value of the DELETE_FLOW submission parameter to Y, on the Source System Parameters tab within the Design view:
Execute the ETLs as listed in the Executing the ETL Execution Plans section.
The Siebel Clinical related SDE mappings reads the above instance of the RXI_DELETE_LOG_S table.
Note:
Records that are deleted in the source system are soft deleted in the data warehouse.OCDA provides predefined source-dependent extract (SDE) mappings for Oracle Clinical and Siebel Clinical. OCDA pre-creates entries for one instance of each of these application databases in the W_RXI_DATASOURCE_S table. If you want to add another database (whether for one of these applications or for another application), perform the following tasks:
Create a new SDE programs to load the tables from source system to the staging area. For more information about creating a new SDE program, refer to Creating an ETL Execution Plan.
Insert a record into the W_RXI_DATASOURCE_S table, assigning the source a unique DATASOURCE_NUM_ID. Set this value to a number greater than 100.
Important:
When calling an SDE mapping to read from a particular database instance, ensure that you pass in the value of DATASOURCE_NUM_ID that corresponds to that database. Also pass TENANT_ID (normally 0) to the SDE.
If you write new SDE, ensure that it sets the value of DATASOURCE_NUM_ID in the staging table to which it writes.
In DAC, navigate to the Setup view, then select the Physical Data Sources tab.
Enter values in the following fields:
Name — Logical name for the database connection.
Type — Select Source for the database connection for a database.
Connection Type — Type of database.
Dependency Priority — Number used to generate dependencies when designing execution plans.
Data Source Number — Unique number assigned to the data source category so that the data can be identified in the data warehouse. Enter the same value as you have given in the W_RXI_DATASOURCE_S table.
Figure 5-2 displays the OCDA hierarchy:
Following is the OCDA hierarchy:
CONTAINER (CDA_Warehouse) - A single container that holds all objects used for OCDA.
EXECUTION PLAN (CDA – Complete Warehouse) - A data transformation plan defined on subject areas that needs to be transformed at certain frequencies of time. An execution plan is defined based on business requirements for when the data warehouse needs to be loaded. Single Execution Plan to Load Complete Warehouse.
SUBJECT AREAS - A logical grouping of tables related to a particular subject or application context. It also includes the tasks that are associated with the tables, as well as the tasks required to load the tables. Subject areas are assigned to execution plans, which can be scheduled for full or incremental loads.
TASK GROUPS - This is a group of tasks that should be run in a given order.
TASKS - A unit of work for loading one or more tables. A task comprises the following: source and target tables, phase, execution type, truncate properties, and commands for full or incremental loads. A single Informatica workflow.
To load data from the source to their target tables in the data warehouse, run the Execution Plan packaged with OCDA. Perform the following tasks in DAC:
Navigate to the Execute view.
Select CDA – Complete Warehouse execution plan.
Set the parameter values under the Parameter tab.
Build the execution plan.
Click Run.
Note:
The prune_days parameter is used to determine the extraction end date for the incremental load. By default, the value of this parameter is 1. This indicates that the source extraction end date is a day less than the ETL program run date. For example, if the ETL program run date is 28 July, 2010 the source extraction end date is 27 July, 2010.If Oracle Clinical is your only data source:
Navigate to Execution Plan tab.
Click CDA - Oracle Clinical Warehouse Execution Plan.
Set the required parameters.
Build the Execution Plan.
Click Run.
If Siebel Clinical is your only data source:
Navigate to Execution Plan tab.
Click CDA - Siebel Clinical Warehouse Execution Plan.
Set the required parameters.
Build the Execution Plan.
Click Run.
Note:
Execution of the ETL (specifically theOCDA_ETL_RUN_S_POP
program) populates W_ETL_RUN_S.LOAD_DT with the timestamp for the execution of the ETL. This ETL execution timestamp is used in the calculation of OCDA measures concerning the amount of time that currently open discrepancies have been open.
While the timestamp is captured in CURRENT_ETL_LOAD_DT, it is only available for calculation of discrepancy intervals through the OBIEE Dynamic Repository Variable CURRENT_DAY. CURRENT_DAY is refreshed from LOAD_DT at a fixed interval, by default 5 minutes, starting each time the Oracle BI Service is started. Between the time that the ETL is run, and the time that CURRENT_DAY is refreshed, calculations of intervals that currently open discrepancies have been open will be inaccurate.
There are two remedies: (i) restart the Oracle BI Server after every execution of the ETL. This will cause CURRENT_DAY to be refreshed to the correct value. (ii) If this is inconvenient, you can modify the intervals between refreshes of the value of CURRENT_DAY. For more information on how to modify the refresh interval for CURRENT_DAY, refer to Maintaining the Oracle Clinical Development Analytics Repository.
Tip:
You can schedule the jobs to execute at regular intervals. For more information on scheduling jobs, refer to Scheduling an ETL Execution Plan.When you customize an ETL Execution Plan, it is your responsibility to maintain version control over changes to ETL mappings.
Oracle recommends that you carefully track the changes you make to Oracle-supplied ETL so that you can re-apply these changes in subsequent releases.
Though OCDA includes ETL Execution Plans for extracting data from Oracle Clinical and Siebel Clinical to OCDA data warehouse, you may want to create your own ETL to extract data from other data sources.
Note:
The value of DATASOURCE_NUM_ID is set to 1 for Oracle Clinical and 2 for Siebel Clinical. If you want to add your own data sources, set this value to a number greater than 100.See Also:
Informatica PowerCenter Online Help
To add one or more tables or columns along with the associated ETL Execution Plans to populate data into these table, perform the following tasks:
Create the new source and target table metadata inside Informatica.
Work in Informatica PowerCentre and create the ETL components (transformation or workflow) for this program.
Create the required workflow for this mapping.
Connect to DAC and create a new task for this new mapping.
Synchronize the task.
Add the task to subject area.
Build the Execution Plan (CDA - Complete Warehouse).
You may also want to modify an existing ETL to meet your reporting requirements.
See Also:
Informatica PowerCenter Online Help
To modify an ETL without any changes to the associated tables or columns, perform the following tasks:
Identify the Execution Plan that needs to be modified in Informatica repository.
Open and Modify the ETLs (transformation and/or workflow).
Test and save the changes in repository.
Connect to DAC and navigate to the corresponding task.
Right-click the task and synchronize it.
Navigate to the execution plan and execute ETL to verify the changes.
Note:
The ETL Execution Plans that extract data for the warehouse fact tables assume that the dimensions to which each fact is related are up-to-date at the time the fact ETL Execution Plans are executed. This assumption is the basis for certain fact calculations that would provide erroneous results if the assumption were not true. For example, in the received CRFs fact, the value of the pCRF entry complete measure depends on whether or not the study requires second pass entry. But that piece of information -- second pass entry required -- is obtained from an attribute of the Study dimension. So, if the second-pass requirement for a study changes, and the change is not applied to the Study dimension, the Received CRF fact attributes will contain incorrect values.As shipped, OCDA ETL workflows ensure this interlock by executing the ETL for related dimensions immediately before running the ETL for a fact. This is standard warehouse management practice, but especially important given the interdependence of the dimensions and the fact. The need to execute dimension ETL immediately before corresponding fact ETL, and the danger of not doing it, is emphasized here because it is possible (though discouraged) to modify these shipped workflows.
To modify one or more tables or columns without any changes to the associated ETL programs (typically to widen a column):
Change the table properties as needed.
Save the mapping and refresh the workflow.
Connect to DAC and navigate to corresponding task and refresh it.
Note:
If the changes to the tables or columns are not compatible with the table that is installed in the data warehouse schema, you will get a warning while making the change. For example, if you are reducing the length of a number column from 15 to 10, the change is not compatible with the data existing in the table.When you submit a Execution Plan for execution in DAC, you can schedule it execute at regular intervals. To schedule a Execution Plan, perform the following tasks:
Navigate to the Scheduler tab within the Execute view.
Create a new schedule plan.
Enter the required details and click Save.