1 ETL Overview

This chapter provides an overview of the fundamental concepts related to Oracle BI Applications ETL processes.

This chapter contains the following topics:

Before You Begin Running ETL Processes
About ETL Architecture
About ETL Phases
About Load Plans
About the ODI Repository
About Changed Data Capture
About Knowledge Modules
About Reverse Knowledge Modules
About ETL Roles

Before You Begin Running ETL Processes

Before you begin running Oracle BI Applications ETL processes, you must have completed the installation and setup of Oracle BI Applications, as documented in Oracle Fusion Middleware Installation Guide for Oracle Business Intelligence Applications.

You must also have run the domains load plan, which loads source-specific data into Oracle BI Applications Configuration Manager tables. This enables Configuration Manager to display the appropriate source-specific values as choices in drop-down lists for setup objects. For instructions on running the domains load plan, see Oracle Fusion Middleware Installation Guide for Oracle Business Intelligence Applications.

About ETL Architecture

Figure 1-1 illustrates the ETL architecture. Typically, the extract-load-transform process has two main steps: The first step is the extract and stage load step, and the second step is the load transform step. The extract and stage load step is generated from a combination of the main interface and the nested temporary interface. The load transform step is generated as a result of the integration knowledge module (IKM). In this example, step 1 issues a SQL statement on the source that joins the GL_SET_OF_BOOKS table with the HR_ORGANIZATION_INFORMATION table. The join is executed on the source database, and the resulting data is staged. Then, a second join occurs at the load transform stage between the W_DOMAIN_G table and the temporary stage table, which results in the loading of the stage table W_INV_ORG_DS.

Note that Oracle Database is the only database type supported for the Oracle BI Applications repository schemas and the Business Analytics Warehouse.

Figure 1-1 ETL Architecture

This diagram is described in the surrounding text.

Figure 1-2 shows the general load plan pattern. There are four main stages: The SDE (source dependent extract) tasks extract data from the source dimension and fact tables and load the the data into universal dimension and fact staging tables. The SIL tasks are common and load data from the universal staging tables into the warehouse staging tables. Figure 1-2 depicts a dependency between the dimension table and the fact table. Therefore, the SIL DIM must be executed before the SIL FACT, to resolve the dimension key. The SIL DIM has a database sequence that generates the key, and then the SIL FACT looks up that key when loading the fact staging table.

Figure 1-2 Load Plan Pattern

This graphic is described in the surrounding text.

About ETL Phases

Oracle BI Applications ETL processes include the following phases:

SDE. SDE stands for Source Dependent Extract. In the first phase, SDE tasks extract data from the source system and stage it in staging tables. SDE tasks are source specific.
SIL. SIL stands for Source Independent Load. Load tasks transform and port the data from staging tables to base fact or dimension tables. SIL tasks are source independent.
PLP. PLP stands Post Load Process. PLP tasks are only executed after the dimension and fact tables are populated. A typical usage of a PLP task is to transform data from a base fact table and load it into an aggregate table. PLP tasks are source independent.

About the ODI Repository

The ODI Repository for Oracle BI Applications comprises two repositories:

Master Repository. Topology of resources, security, version management. A master repository is usually associated with multiple work repositories, but work repositories are always attached to a single Master repository
Work Repository. Contains data models and projects. This is the development and execution repository.

The master and work repositories must reside in the same database schema, and the database type must be Oracle. Both the master and work repositories are set up during the Oracle BI Applications installation process.

Note that the default ODI repository ID is 500. This ID is part of the internal ID for every object that is created within the ODI repository. Having a repository ID greater than or equal to 500 is critical to ensure that the objects you create do not overlap with any current or future Oracle BI Applications objects. If you change the default ODI repository ID, make sure the new value is greater than 500.

About Load Plans

A load plan is an executable object that comprises and organizes the child objects (referred to as steps) that carry out the ETL process. A load plan is made up of a sequence of several types of steps. Each step can contain several child steps. Depending on the step type, the steps can be executed conditionally, in parallel or sequentially.

You define a load plan in Oracle BI Applications Configuration Manager by selecting a data source and one or more fact groups. This selection determines which steps need to be performed during the ETL process. Each fact group belongs to a specific functional area or areas that are associated with one or more offerings, which, in turn, are related to a data server. A transactional data source is associated with one or more data servers.

After you define the load plan, you then generate it to build it in the ODI repository. You then execute the load plan to perform the ETL process.

For more information about working with Oracle BI Applications load plans, see Chapter 2, "Managing Load Plans." For information about the topic of load plans in the context of Oracle Data Integrator, see Oracle Fusion Middleware Developer's Guide for Oracle Data Integrator.

About Changed Data Capture

Oracle BI Applications has two ETL modes for loading data into the Oracle Business Analytics Warehouse: full and incremental.

During a full load, Oracle BI Applications extracts:

All records from tables that are sources for dimension tables.
Records created after an "Initial Extract Date" from tables that are sources for fact tables. The Initial Extract Date defines a cut-off so that not all records are loaded into the data warehouse. You set the Initial Extract Date value for each data source in the Oracle BI Applications Configuration Manager.

An ETL process can extract a record from a single table or from multiple tables. When a record is the result of joining multiple tables, one of these tables is identified as the base table, which defines the granularity of the record. When extracting fact records, Oracle BI Applications only compares the "Created Date" of the base table to the Initial Extract Date.

During an incremental load, Oracle BI Applications extracts records that have changed or were created after a "Last Extract Date." This is done by comparing the Last Extract Date value to a "Last Updated Date" (or LUD) type column in the source table. If the source table does not have such a column, Oracle BI Applications extracts all records from that table. The Last Extract Date is a value that is calculated based on the last time data was extracted from that table less a "Prune Days" value. The Prune Days parameter is used to extend the window of the ETL extract beyond the last time the ETL actually ran. This is to ensure records that may have somehow been missed in a prior ETL are picked up in the next ETL. Records can be missed in an ETL process when a record is being updated while the ETL process is running and was not committed until after the ETL completed.

You set the Prune Days parameter value in Oracle BI Applications Configuration Manager. Setting a small value means the ETL will extract fewer records, thus improving performance; however, this increases the chances that records are not detected. Setting a large number is useful if ETL runs are infrequent, but this increases the number of records that are extracted and updated in the data warehouse. Therefore, you should not set the Prune Days value to a very large number. A large Prune Days number can also be used to trigger re-extracting records that were previously processed but have not changed. The value for Prune Days should never be set to 0.

As stated above, an ETL process can extract a record from a single table but more commonly extracts records that are the result of joining multiple tables. When extracting from multiple tables, one table is identified as the base table, which defines the granularity of the record. When there is a change in the base table, an extract is triggered for the record. However, there can be cases where a change occurs in a non-base table but not in the base table itself. If a change occurs in a non-base table and this should trigger an extract of the record, these tables are referred to as "auxiliary" tables. Thus, when determining if a record should be extracted, Oracle BI Applications compares not only the LUD column in the base table but the LUD columns in all auxiliary tables. If the LUD column changed in any of these tables, the record is extracted. If changes can occur in a table that is not relevant to triggering an extract, this table's LUD column is not compared in the incremental filtering logic.

About Knowledge Modules

Knowledge Modules (KMs) implement different tasks within the Oracle Business Analytics Warehouse system. The different types of KMs are:

Reverse-engineering (RKM). Used for reading the table and other object metadata from source databases and to import tables, columns, and indexes into a model. For more information about RKMs, see "About Reverse Knowledge Modules".
Loading (LKM). Used for efficient extraction of data from source databases for loading into a staging area (database-specific bulk unload utilities can be used where available).
Integration (IKM). Used to load data into a target with different strategies, for example, slowly changing dimensions and insert/update strategies.
Check (CKM). Used to validate and cleanse data.
Journalizing (JKM). Used to record the new and changed data within either a single table or view or a consistent set of tables or views.
Service (SKM). Exposes data in the form of Web services.

For a detailed description of the KMs available with Oracle BI Applications, see Appendix B, "Knowledge Modules Reference."

About Reverse Knowledge Modules

Oracle BI Applications uses the ODI reverse engineering process to populate the repository with metadata from the source system. RKMs retrieve metadata from data storage and load it into the repository. For example, RKMs detects the description of tables, columns, data types, constraints, and comments from a database to load the repository. RKMs support various technologies, such as databases, XML files, and various types of flat files. You can also use RKMs to retrieve non-standard metadata from databases or applications, such as Oracle E-Business Suite, Siebel CRM, PeopleSoft, and so on.

The RKM role is to perform customized reverse engineering for a model. The RKM handles connecting to the application or metadata provider then transforming and writing the resulting metadata into the ODI repository. The metadata is written into temporary tables, and then the RKM calls the ODI API to read from these tables and write to the ODI metadata tables of the Work repository in incremental update mode.

Note that the Oracle BI Applications ODI repository contains the relevant source data models. Therefore, you would need to run an RKM only if you have customized tables in the source system and want to import these changes to the ODI repository. For more information about customizing tables and tasks, see Oracle Fusion Middleware Administrator's Guide for Oracle Business Intelligence Applications.

About Multi-Source Environments

Oracle BI Applications supports the loading of data into the Oracle Business Analytics Warehouse from multiple source systems as long as the source systems are different. Oracle BI Applications does not support multiple instances of the same source system. For example, your environment could have a PeopleSoft 9.0 source and a Oracle EBS 11.5.0 source but not two instances of the Oracle EBS 11.5.10 source.

Keep in mind that multi-source ETL processes are different from single-source ETL processes in that you could potentially have both sources being used to load the same fact groups and dimensions.

Multi-Source Example

Consider a scenario in which you have a PeopleSoft 9.0 source and an Oracle EBS 11.5.10 source, both loading the target fact table ASTXACT_FG. Loading this target table includes three serial steps:

Initialize Target Table (ASTXACT_FG).
Load Target Table (ASTXACT_FG).
Finalize Target Table (ASTXACT_FG).

Figure 1-3 shows these load plan steps in ODI Studio.

Figure 1-3 Multi-Source Load Plan

Data from both sources, PeopleSoft 9.0 and Oracle EBS 11.5.10, is being loaded into the target table (ASTXACT_FG) for the first time, which means a full load will occur; and, generally, when a full load occurs, the target table is truncated. In this case, the load plans can run in parallel because the data sources are not dependent on each other; however, the load from the second source should not truncate the target table if it already contains data from the first source. This issue is resolved as follows: The Initialize Target Table step truncates the target table. The Load Target Table step, in turn, has two child parallel steps. Each of these loads the target from each source system. Finally, the Finalize Target Table step creates any indexes and analyzes the table. Thus, the generated load plan ensures that the table is truncated only at the required time (in this case, only once before any source system loads it).

Note that you can have separate load plans for each source, but load plans should not run in parallel.

Caution:

Oracle BI Applications Configuration Manager and ODI will not stop load plans from running in parallel; however, it is recommended that you do not do so because of the following reasons:

All load plans truncate target tables upon the initial load. If you run load plans in parallel, one load plan can truncate the data loaded by the preceding load plan.
The mappings from SILOS onwards are common and not based on the source system. If load plans are run in parallel, you can have a situation in which only partial data is loaded from the second source due to the data that was loaded from the first source. To resolve this issue, you need to make sure that the first load plan completes successfully before the second load plan runs.

In other words, when loading a fact table, the fact table could be connecting with multiple dimension and lookup tables to get the final data. When load plans are running parallel, some of the dimension, lookup, and staging tables could also have data from a second source. This could lead to some lookups and dimensions not returning appropriate value for the second source, since they have not yet been completely loaded

About ETL Roles

Oracle BI Applications has two duty roles for ETL operations:

Load Plan Operator Duty
Load Plan Administrator Duty

Oracle BI Applications has the following additional duty roles:

BI Applications Administrator Duty
BI Applications Functional Developer Duty
BI Applications Implementation Manager Duty

Access to Configuration Manager and Functional Setup Manager is controlled through these duty roles.

The security administrator must grant the appropriate duty roles to a user based on the user's job responsibilities. For information on the Configuration Manager and Functional Setup Manager screens that each duty role has access to, see Oracle Fusion Middleware Security Guide for Oracle Business Intelligence Applications.

The BI Applications administrator, load plan operator, and load plan administrator users will require appropriate access to ODI. In addition to these users being created in the LDAP system, these users must also be created in the ODI Repository, and they must be granted the Supervisor profile or an appropriate ODI profile. The BI Applications administrator must be granted the Supervisor role in ODI. Work with your security administrator to obtain the appropriate duty roles.

For more information about managing security in ODI, see Oracle Fusion Middleware Developer's Guide for Oracle Data Integrator security guide.