4 ETL Implementation and Customization

This chapter discusses the ETL (extraction, transformation and loading) programs you use to populate an Oracle Utilities Data Model warehouse. It includes the following topics:

The Role of ETL in Oracle Utilities Data Model

Describes source-ETL and Intra-ETL for Oracle Utilities Data Model.

As illustrated in Figure 4-1, you use two types of ETL (extraction, transformation and loading) to populate the staging layer, the foundation layer, and the access layer.

  • Source-ETL. ETL that populates the staging layer (if any) and the foundation layer (that is, the base, reference, and lookup tables) with data from the operational system is known as source ETL.

    Oracle Utilities Data Model does not include source-ETL scripts. Unless you are using an application adapter for Oracle Utilities Data Model, you must create source-ETL yourself using your understanding of your operational and other source systems and your customized Oracle Utilities Data Model.

  • Intra-ETL. ETL that populates the access layer (that is, the derived tables, aggregate tables, materialized views, OLAP cubes, and data mining models) using the data in the foundation layer is known as intra-ETL.

    Oracle Utilities Data Model does include intra-ETL. You can modify the default intra-ETL to populate a customized access layer from a customized foundation layer.

ETL for the Foundation Layer of an Oracle Utilities Data Model Warehouse

ETL that populates the foundation layer of an Oracle Utilities Data Model warehouse (that is, the base, reference, and lookup tables) with data from an operational system is known as source-ETL.

You can populate the foundation layer of an Oracle Utilities Data Model warehouse in the following ways:

  • Write your own source-ETL scripts using Oracle Data Integrator or another ETL tool and then use those scripts to populate the foundation layer.

Writing Your Own Source-ETL

Using Oracle Utilities Data Model you must write your own source-ETL scripts using Oracle Data Integrator or another ETL tool or mapping tool.

The following topics provide general information about writing source-ETL:

Source-ETL Design Considerations

Keep specific design features in mind when you design and write source-ETL for Oracle Utilities Data Model.

  • You can populate the calendar data by using the calendar population scripts provided with Oracle Utilities Data Model

  • Populate the tables in the following order:

    1. Lookup tables

    2. Reference tables

    3. Base tables

  • Analyze the tables in one category before loading the tables in the next category (for example, analyze the reference tables before loading the lookup tables). Additionally, you must analyze all of the tables loaded by the source-ETL process before executing the intra-ETL processes).

    See:

    The topic about analyzing tables, indexes, and clusters in Oracle Database Administrator's Guide.

ETL Architecture for Oracle Utilities Data Model Source-ETL

ETL (or EL-T, that is, Extract, Load and Transform) first extracts data from the original sources, assures the quality of the data, cleans the data, and makes the data consistent across the original sources. ETL then populates the physical objects with the "clean" data so that query tools, report writers, dashboards and so on can access the data.

The fundamental services upon which data acquisition is constructed are as follows:

  • Data sourcing

  • Data movement

  • Data transformation

  • Data loading

From a logical architecture perspective, there are many different ways to configure these building blocks for delivering data acquisition services. The major architectural styles available that cover a range of options to be targeted within a data warehousing architecture include:

  • Batch Extract, Transform, and Load and Batch Extract, Load, Transform, Load

    Batch Extract, Transform and Load (ETL) and Batch Extract, Load, Transform, Load (ELTL) are the traditional architecture's in a data warehouse implementation. The difference between them is where the transformation proceed in or out of the database.

  • Batch Hybrid Extract, Transform, Load, Transform, Load

    Batch Hybrid Extract, Transform, Load, Transform, Load (ETLTL) is a hybrid strategy. This strategy provides the most flexibility to remove hand coding approaches to transformation design, apply a metadata-driven approach, and still be able to leverage the data processing capabilities of the enterprise warehouse. In this targeted design, the transformation processing is first performed outside the warehouse as a pre-processing step before loading the staging tables, and then further transformation processing is performed within the data warehouse before the final load into the target tables.

  • Real-time Extract, Transform, Load

    Real-time Extract, Transform, Load (rETL) is appropriate when service levels for data freshness demand more up-to-date information in the data warehousing environment. In this approach, the OLTP system must actively publish events of interest so that the rETL processes can extract them from a message bus (queue) on a timely basis. A message-based paradigm is used with publish and subscribe message bus structures or point-to-point messaging with reliable queues. In such cases, the staging area can be used as a real-time Operational Data Store, at least for the source concerned, and aggregation could run directly from the Operational Data Store (operational system) to the Access layer, or to the presentation layer in specific cases.

When designing source-ETL for Oracle Utilities Data Model, use the architecture that best meets your business needs.

Creating a Source to Target Mapping Document for the Source-ETL

Before you begin building your extract systems, create a logical data interface document that maps the relationship between original source fields and target destination fields in the tables. This document ties the very beginning of the ETL system to the very end.

Columns in the data mapping document are sometimes combined. For example, the source database, table name, and column name could be combined into a single target column. The information within the concatenated column would be delimited with a period. Regardless of the format, the content of the logical data mapping document has been proven to be the critical element required to sufficiently plan ETL processes.

Designing a Plan for Rectifying Source-ETL Data Quality Problems

Data cleaning consists of all the steps required to clean and validate the data feeding a table and to apply known business rules to make the data consistent. The perspectives of the cleaning and conforming steps are less about the upside potential of the data and more about containment and control.

There are several potential data quality issues, related to each other, that the staging area needs to handle:

  • Data Validity: Is the data content and type sufficient to be usable, and as expected (and "profile" in case one uses this advanced option)?

  • Data Accuracy: correct addresses, correct with respect some "true" standard (or as such defined).

  • Data Completeness: is all the required data there? What to do when data is missing? What represents the minimum set of required data?

  • Data Consistency: that is, consistency of the data between the various sources and what rules one applies for inconsistencies.

  • Data Latency: A sub-part of data consistency, but treated separately because of its importance: when does data arrive, over which period and in which one can we combine, which one not?

  • Data Reasoning: This is more at reporting level but can be applied at the staging level: Does the data I see make sense from a business perspective? Can I really combine the data as an end-user would expect?

As a consequence, a multi-layer staging is generally required or expected.

If there are data quality problems, then build a plan, in agreement with IT and business users, for how to rectify these problems.

Answer the following questions:

  • Is data missing?

  • Is the data wrong or inconsistent?

  • Should the problem be fixed in the source systems?

  • Set up the data quality reporting and action program and people responsibility.

Set up the following processes and programs:

  • Set up a data quality measurement process.

  • Set up the data quality reporting and action program and people responsibility.

Designing Source-ETL Workflow and Jobs Control

All data movement among ETL processes are composed of jobs. An ETL workflow executes these jobs in the proper sequence and with the necessary dependencies. General ETL tools, such as Oracle Warehouse Builder, support this kind of workflow, job design, and execution control.

Tips for designing ETL jobs and workflow:

  • Use common structure across all jobs (source system to transformer to target data warehouse).

  • Have a one-to-one mapping from source to target.

  • Define one job per Source table.

  • Apply generic job structure and template jobs to allow for rapid development and consistency.

  • Use an optimized job design to leverage Oracle load performance based on data volumes.

  • Design parameterized job to allow for greater control over job performance and behavior.

  • Maximize Jobs parallelism execution.

Designing Source-ETL Exception Handling

As a general principle, all ETL logs status and errors into a table. You monitor execution status using an ETL tool or by querying this log table directly.

Your ETL tool or your developed mapping scripts generate status and error handling tables.

As a general principle, all ETL logs status and errors into a table. You monitor execution status using an ETL tool or by querying this log table directly.

Writing Source-ETL that Loads Efficiently

Whether you are developing mapping scripts and loading into a staging layer or directly into the foundation layer the goal is to get the data into the warehouse in the most expedient manner. In order to achieve good performance during the load you must begin by focusing on where the data to be loaded resides and how you load it into the database.

For example, you should not use a serial database link or a single JDBC connection to move large volumes of data. The most common and preferred mechanism for loading large volumes of data is loading from flat files.

The following topics discuss best practices for ensuring your source-ETL loads efficiently:

Using a Staging Area for Flat Files

The area where flat files are stored before being loaded into the staging layer of a data warehouse system is commonly known as staging area.

The overall speed of your load is determined by:

  • How quickly the raw data can be read from staging area.

  • How quickly the raw data can be processed and inserted into the database.

Recommendations: Using a Staging Area

Stage the raw data across as many physical disks as possible to ensure that reading it is not a bottleneck during the load.

Also, if you are using the Exadata Database Machine, the best place to stage the data is in an Oracle Database File System (DBFS) stored on the Exadata storage cells. DBFS creates a mountable cluster file system which can you can use to access files stored in the database. Create the DBFS in a separate database on the Database Machine. This allows the DBFS to be managed and maintained separately from the data warehouse.

Mount the file system using the DIRECT_IO option to avoid thrashing the system page cache while moving the raw data files in and out of the file system.

Preparing Raw Data Files for Source-ETL

In order to parallelize the data load Oracle Database must be able to logically break up the raw data files into chunks, known as granules. To ensure balanced parallel processing, the number of granules is typically much higher than the number of parallel server processes. At any given point in time, a parallel server process is allocated one granule to work on. After a parallel server process completes working on its granule, another granule is allocated until all of the granules are processed and the data is loaded.

Recommendations: Preparing Raw Data Files for Source-ETL

Follow these recommendations:

  • Deliminate each row using a known character such as a new line or a semicolon. This ensures that Oracle can look inside the raw data file and determine where each row of data begins and ends in order to create multiple granules within a single file.

  • If a file is not position-able and seek-able (for example the file is compressed or zip file), then the files cannot be broken up into granules and the whole file is treated as a single granule. In this case, only one parallel server process can work on the entire file. In order to parallelize the loading of compressed data files, use multiple compressed data files. The number of compressed data files used determines the maximum parallel degree used by the load.

  • When loading multiple data files (compressed or uncompressed):

    • Use a single external table, if at all possible

    • Make the files similar in size

    • Make the size of the files a multiple of 10 MB

  • If you must have files of different sizes, list the files from largest to smallest. By default, Oracle assumes that the flat file has the same character set as the database. If this is not the case, specify the character set of the flat file in the external table definition to ensure the proper character set conversions can take place.

Source-ETL Data Loading Options

Oracle offers several Source-ETL data loading options.

Oracle offers several data loading options

  • External table or SQL*Loader

  • Oracle Data Pump (import and export)

  • Change Data Capture and Trickle feed mechanisms (such as Oracle GoldenGate)

  • Oracle Database Gateways to open systems and mainframes

  • Generic Connectivity (ODBC and JDBC)

The approach that you take depends on the source and format of the data you receive.

Recommendations: Loading Flat Files

If you are loading from files into Oracle you have two options: SQL*Loader or external tables.

Using external tables offers the following advantages:

  • Allows transparent parallelization inside the database.You can avoid staging data and apply transformations directly on the file data using arbitrary SQL or PL/SQL constructs when accessing external tables. SQL Loader requires you to load the data as-is into the database first.

  • Parallelizing loads with external tables enables a more efficient space management compared to SQL*Loader, where each individual parallel loader is an independent database sessions with its own transaction. For highly partitioned tables this could potentially lead to a lot of wasted space.

You can create an external table using the standard CREATE TABLE statement. However, to load from flat files the statement must include information about where the flat files reside outside the database. The most common approach when loading data from an external table is to issue a CREATE TABLE AS SELECT (CTAS) statement or an INSERT AS SELECT (IAS) statement into an existing table.

Parallel Direct Path Load Source-ETL

A direct path load parses the input data according to the description given in the external table definition, converts the data for each input field to its corresponding Oracle data type, then builds a column array structure for the data. These column array structures are used to format Oracle data blocks and build index keys.

The newly formatted database blocks are then written directly to the database, bypassing the standard SQL processing engine and the database buffer cache.

The key to good load performance is to use direct path loads wherever possible:

  • A CREATE TABLE AS SELECT (CTAS) statement always uses direct path load.

  • A simple INSERT AS SELECT (IAS) statement does not use direct path load. In order to achieve direct path load with an IAS statement you must add the APPEND hint to the command.

Direct path loads can also run in parallel. To set the parallel degree for a direct path load, either:

  • Add the PARALLEL hint to the CTAS statement or an IAS statement.

  • Set the PARALLEL clause on both the external table and the table into which the data is loaded.

    After the parallel degree is set:

    • A CTAS statement automatically performs a direct path load in parallel.

    • An IAS statement does not automatically perform a direct path load in parallel. In order to enable an IAS statement to perform direct path load in parallel, you must alter the session to enable parallel DML by executing the following statement:

      alter session enable parallel DML;
      

Partition Exchange Load for Oracle Utilities Data Model Source-ETL

A benefit of partitioning is the ability to load data quickly and easily with minimal impact on the business users by using the EXCHANGE PARTITION command.

The EXCHANGE PARTITION command enables swapping the data in a nonpartitioned table into a particular partition in your partitioned table. The EXCHANGE PARTITION command does not physically move data, instead it updates the data dictionary to exchange a pointer from the partition to the table and vice versa.

Because there is no physical movement of data, an exchange does not generate redo and undo. In other words, an exchange is a sub-second operation and far less likely to impact performance than any traditional data-movement approaches such as INSERT.

Recommendations: Partitioning Tables

Partition the larger tables and fact tables in the Oracle Utilities Data Model warehouse.

Example 4-1 Using Exchange Partition Statement with a Partitioned Table

Assume that there is a large table called Sales, which is range partitioned by day. At the end of each business day, data from the online sales system is loaded into the Sales table in the warehouse.

The following steps ensure the daily data gets loaded into the correct partition with minimal impact to the business users of the data warehouse and optimal speed:

  1. Create external table for the flat file data coming from the online system

  2. Using a CTAS statement, create a nonpartitioned table called tmp_sales that has the same column structure as Sales table

  3. Build any indexes that are on the Sales table on the tmp_sales table

  4. Issue the EXCHANGE PARTITION command.

    Alter table Sales exchange partition p2 with
        table top_sales including indexes without validation;
    
  5. Gather optimizer statistics on the newly exchanged partition using incremental statistics.

The EXCHANGE PARTITION command in this example, swaps the definitions of the named partition and the tmp_sales table, so the data instantaneously exists in the right place in the partitioned table. Moreover, with the inclusion of the INCLUDING INDEXES and WITHOUT VALIDATION clauses, Oracle swaps index definitions and does not check whether the data actually belongs in the partition - so the exchange is very quick.

Note:

The assumption being made in this example is that the data integrity was verified at date extraction time. If you are unsure about the data integrity, omit the WITHOUT VALIDATION clause so that the Database checks the validity of the data.

Customizing Intra-ETL for Oracle Utilities Data Model

The Oracle Utilities Data Model uses workflow implemented using PL/SQL packages to execute the intra-ETL process.

The workflow consists of three major components:

  1. Executing Derived Intra-ETL Programs

    1. Independent Derived intra-ETL programs - Level 0

    2. First level dependent Derived intra-ETL programs - Level 1

  2. Refreshing Aggregate Materialized Views:

  3. Refreshing OLAP Cubes

Figure 4-2 illustrates the Oracle Utilities Data Model intra-ETL workflow.

Figure 4-2 Oracle Utilities Data Model Intra-ETL Workflow

Description of Figure 4-2 follows
Description of "Figure 4-2 Oracle Utilities Data Model Intra-ETL Workflow"

Executing Derived Intra-ETL Programs

The first component of the workflow to execute the intra-ETL process is the Derived intra-ETL programs. This component has two subcomponents to deal with the dependency among Derived intra-ETL programs:

  • The first subcomponent, Independent Derived intra-ETL programs, has Derived intra-ETL programs that get data from foundation layer tables, that is, base, lookup, and reference tables.

  • The second subcomponent has Derived intra-ETL programs that depend on the first subcomponent, Independent Derived intra-ETL programs. The second subcomponent intra-ETL programs get data from foundation layer tables, that is, base, lookup, and reference tables and also from derived tables that have intra-ETL programs in first subcomponent.

The Intra-ETL programs in the two subcomponents are implemented using PL/SQL packages. All Intra-ETL packages insert data for the ETL period mentioned in DWC_ETL_PARAMTER table for "OUDM-INTRA-ETL" process. Modify the ETL period of the process as per the data load requirements. If you are trying to load data for ETL period, for which data is already loaded, intra-ETL program first truncates the partitions existing for the ETL period, and then loads data into the target derived table.

Refreshing Aggregate Materialized Views

This is the second component of the workflow. This component depends on the first component, Executing Derived intra-ETL programs. The execution of this component happens only when the execution of the first component completes successfully.

Refreshing OLAP Cubes

This is the third component of the workflow. This component depends on the second component, Refreshing Aggregate materialized views, which in turn depends on the first component, Executing Derived intra-ETL programs. The execution of this component happens only when the execution of the second component completes successfully. This component refreshes data in OLAP cubes and dimensions based on the parameters given in DWC_OLAP_ETL_PARAMETER table.

Executing Intra-ETL Workflow

Oracle Utilities Data Model intra-ETL workflow is implemented using a PL/SQL package, PKG_INTRA_ETL_PROCESS.

Each component and their subcomponents of intra-ETL workflow have one procedure each. All these procedures are private to the package. The package has only one public procedure, which invokes all private procedures as depicted in the workflow figure. Before executing the workflow, ensure that you set all ETL parameters in DWC_OLAP_PARAMTER and DWC_OLAP_ETL_PARAMETER tables. Invoking PKG_INTRA_ETL_PROCESS.RUN procedure starts the workflow execution.

Performing an Initial Load of an Oracle Utilities Data Model Warehouse

Performing an initial load of an Oracle Utilities Data Model is a multistep process with steps: :loading the foundation layer and loading the access layer.

Performing an initial load of an Oracle Utilities Data Model is a multistep process:

  1. Load the foundation layer of the Oracle Utilities Data Model warehouse (that is, the reference, lookup, and base tables) .
  2. Load the access layer of the Oracle Utilities Data Model warehouse (that is, the derived and aggregate tables, materialized views, OLAP cubes, and data mining models).

Performing an Initial Load of the Foundation Layer

Perform initial load of the foundation layer using source-ETL that you create.

Performing an Initial Load of the Access Layer

To perform an initial load of access layer of the Oracle Utilities Data Model warehouse (that is, the derived and aggregate tables, materialized views, OLAP cubes) take the following steps:

  1. Update the parameters in DWC_ETL_PARAMETER control table in the oudm_sys schema for different processes so that the ETL can use this information (that is, the beginning and end date of the ETL period) when loading the derived and aggregate tables and views.

    For an initial load of an Oracle Utilities Data Model warehouse, specify the values shown in the following tables:

    For OUDM-INTRA-ETL process:

    Table 4-1 Initial Load Values for DWC_ETL_PARAMETER Table

    Columns Value

    PROCESS_NAME

    'OUDM-INTRA-ETL'

    FROM_DATE_ETL

    The beginning date of the ETL period.

    TO_DATE_ETL

    The ending date of the ETL period.

  2. Update the Oracle Utilities Data Model OLAP ETL parameters in DWC_OLAP_ETL_PARAMETER control table in the oudm_sys schema to specify the build method and other build characteristics so that the ETL can use this information when loading the OLAP cube data.

    For an initial load of the analytic workspace, specify values following the guidelines in Table 4-2.

    Table 4-2 Values of OLAP ETL Parameters in the DWC_OLAP_ETL_PARAMETER table for Initial Load

    Column Name Value

    PROCESS_NAME

    ' OUDM-OLAP-ETL'

    BUILD_METHOD

    C which specifies a complete refresh which clears all dimension values before loading.

    CUBENAME

    One of the following values that specifies the cubes you want to build:

    • ALL specifies a build of the cubes in the Oracle Utilities Data Model analytic workspace.

    • cubename[[|cubename]...] specifies one or more cubes to build.

    MAXJOBQUEUES

    A decimal value that specifies the number of parallel processes to allocate to this job. (Default value is 4.) The value that you specify varies depending on the setting of the JOB_QUEUE_PROCESSES database initialization parameter.

    CALC_FCST

    One of the following values depending on whether you want to calculate forecast cubes:

    • Y specifies calculate forecast cubes.

    • N specifies do not calculate forecast cubes.

    NO_FCST_YRS

    If the value for the CALC_FCST column is Y, specify a decimal value that specifies how many years forecast data you want to calculate; otherwise, specify NULL.

    FCST_MTHD

    If the value for the CALC_FCST column is Y, then specify AUTO; otherwise, specify NULL.

    FCST_ST_YR

    If the value for the CALC_FCST column is Y, then specify value specified as 'BY YYYY' which is the "start business year" of a historical period; otherwise, specify NULL.

    FCST_END_YR

    If the value for the CALC_FCST column is Y, then specify value specified as 'BY YYYY' which is the "end business year" of a historical period; otherwise, specify NULL.

    OTHER1

    Specify NULL.

    OTHER2

    Specify NULL.

  3. Execute the intra-ETL.

Executing the Default Oracle Utilities Data Model Intra-ETL

Intra-ETL workflow is implemented using PL/SQL package, PKG_INTRA_ETL_PROCESS. This package has a public procedure, Run, and also has private procedures for executing derived intra-ETL programs, refreshing aggregate materialized views, and refreshing OLAP cubes.

The public procedure, Run, invokes all private procedures.

Before executing intra-ETL workflow, update ETL parameters in DWC_ETL_PARAMETER and DWC_OLAP_ETL_PARAMETER tables. It is suggested to use oudm_user user to update ETL parameter tables and executing intra-ETL workflow. Oracle Utilities Data Model installation creates oudm_user and grants roles and privileges required for updating ETL parameter tables and executing intra-ETL workflow.

sqlplus oudm_user/oudm_user@SID

Update ETL parameter tables:

SQL> UPDATE DWC_ETL_PARAMETER
SET from_date_etl = < The beginning date of the ETL period >,
    to_date_etl   = < The ending date of the ETL period >
WHERE process_name = 'OUDM-INTRA-ETL'
;
/
SQL> commit;
 
SQL> UPDATE DWC_OLAP_ETL_PARAMETER
SET build_method = <>,
    cubename   = <>,
          .
  .
  .
  .
fcst_st_yr = <>,
fcst_end_yr = <>
;
/
SQL> commit;

Run the following command to execute intra-ETL workflow:

SQL> BEGIN 
OUDM_SYS.PKG_INTRA_ETL_PROCESS.Run; 
END; 
/

The status of each activity is tracked using DWC_INTRA_ETL_ACTIVITY table. The status of each cube data loading is tracked using DWC_OLAP_ACTIVITY table. The status of the entire intra-ETL workflow process is tracked using DWC_INTRA_ETL_PROCESS table.

Refreshing the Data in an Oracle Utilities Data Model Warehouse

After the initial load, you must load new data into the Oracle Utilities Data Model data warehouse regularly so that it can serve its purpose of facilitating business analysis.

To load new data into Oracle Utilities Data Model warehouse, you extract the data from one or more operational systems and copy that data into the warehouse. The challenge in data warehouse environments is to integrate, rearrange and consolidate large volumes of data over many systems, thereby providing a new unified information base for business intelligence.

The successive loads and transformations must be scheduled and processed in a specific order that is determined by your business needs. Depending on the success or failure of the operation or parts of it, the result must be tracked and subsequent, alternative processes might be started.

You can do a full incremental load of the Oracle Utilities Data Model warehouse, or you can refresh the data sequentially.

Refreshing the Foundation Layer of Oracle Utilities Data Model Warehouse

You can refresh the foundation layer of an Oracle Utilities Data Model warehouse (that is, the reference, lookup, and base tables) in the following ways:

  • You can refresh the foundation layer using source-ETL scripts that you wrote using Oracle Warehouse Builder or another ETL tool. For more information on creating source-ETL, see "Writing Your Own Source-ETL".

Refreshing the Access Layer of an Oracle Utilities Data Model Warehouse

Refreshing the access layer of an Oracle Utilities Data Model is a multi-step process. You can do a full incremental load of the access layer all at one time, or you can refresh the data sequentially, as follows:

  1. Refreshing Oracle Utilities Data Model Derived Tables
  2. Refreshing Oracle Utilities Data Model Aggregate Materialized Views
  3. Refreshing Oracle Utilities Data Model OLAP Cubes

Refreshing Oracle Utilities Data Model Derived Tables

Refreshing the relational tables in an Oracle Utilities Data Model is a multi-step process:

  1. Refresh the foundation layer of the Oracle Utilities Data Model warehouse (that is, the reference, lookup, and base tables) with operational system data by executing the source-ETL that you have written.

  2. Update the parameters of the DWC_ETL_PARAMETER control table for 'OUDM-INTRA-ETL' process. For an incremental load of an Oracle Utilities Data Model warehouse, specify the values shown in the following table (that is, the beginning and end date of the ETL period) for all three processes.

    Table 4-3 DWC_ETL_PARAMETER Table Parameter Descriptions

    Columns Value

    FROM_DATE_ETL

    The beginning date of the ETL period.

    TO_DATE_ETL

    The ending date of the ETL period.

  3. Create a session by connecting oudm_user user through SQLPLUS. Then, start an intra-ETL process. Make sure the previous process ended with 'COMPLETED-SUCCESS' status before starting a new process:

    sqlplus oudm_user/oudm_user@SID
     
    SQL> DECLARE
      l_process_type  OUDM_SYS.DWC_INTRA_ETL_PROCESS.PROCESS_TYPE%TYPE;
      l_error_text    OUDM_SYS.DWC_MESSAGE.MESSAGE_TEXT%TYPE;
      l_process_no    NUMBER;
    BEGIN
      l_process_no := OUDM_SYS.PKG_INTRA_ETL_UTIL.Start_Process(l_process_type,l_error_text);
    END;
    /
    
  4. Refresh Oracle Utilities Data Model derived tables by executing following commands:

SQL> DECLARE
  p_process_no    NUMBER;
  l_status        VARCHAR2(20);
BEGIN
  l_status := 
    OUDM_SYS.PKG_DWD_ACCT_ARRER_MO.Load('DWD_ACCT_ARRER_MO',p_process_no);
  l_status := 
    OUDM_SYS.PKG_DWD_ACCT_BAL_MO.Load('DWD_ACCT_BAL_MO',p_process_no);
  l_status := 
    OUDM_SYS.PKG_DWD_ACCT_DEBT_DAY.Load('DWD_ACCT_DEBT_DAY',p_process_no);         
  l_status := OUDM_SYS.PKG_DWD_ACCT_PMT_MTD_STAT_HST.Load('DWD_ACCT_PYMT_MTHD_STAT_HIST',p_process_no);
  l_status := 
    OUDM_SYS.PKG_DWD_ACCT_PYMT_DAY.Load('DWD_ACCT_PYMT_DAY',p_process_no);
  l_status := OUDM_SYS.PKG_DWD_END_DVC_EVT_CUST_DAY.Load('DWD_END_DVC_EVT_CUST_DAY',p_process_no);
l_status :=   OUDM_SYS.PKG_DWD_END_DVC_EVT_DVC_DAY.Load('DWD_END_DVC_EVT_DVC_DAY',p_process_no);
  l_status := 
    OUDM_SYS.PKG_DWD_MTR_RDNG_DAY.Load('DWD_MTR_RDNG_DAY',p_process_no);
  l_status := 
    OUDM_SYS.PKG_DWD_MTR_RDNG_HR.Load('DWD_MTR_RDNG_HR',p_process_no);
  l_status := 
    OUDM_SYS.PKG_DWD_OUTG_DAY.Load('DWD_OUTG_DAY',p_process_no);
  l_status := 
    OUDM_SYS.PKG_DWD_OUTG_USG_PNT.Load('DWD_OUTG_USG_PNT',p_process_no);        
END;
/
 
SQL> DECLARE
  p_process_no    NUMBER;
  l_status        VARCHAR2(20);
BEGIN
l_status := OUDM_SYS.PKG_DR_PROG_LD_RDCTN_RGN_DAY.Load ('DWD_DR_PROG_LD_RDCTN_RGN_DAY',p_process_no);
l_status := OUDM_SYS.PKG_DWD_RLBLTY_IND_CITY_MO.Load('DWD_RLBLTY_IND_CITY_MO',p_process_no);
l_status := OUDM_SYS.PKG_DWD_RLBLTY_IND_FEDR_MO.Load('DWD_RLBLTY_IND_FEDR_MO',p_process_no);   
END;
/

Refreshing Oracle Utilities Data Model Aggregate Materialized Views

Refreshing the Aggregate Materialized Views in an Oracle Utilities Data Model is a multi-step process:

  1. Refresh the foundation layer of the Oracle Utilities Data Model warehouse (that is, the reference, lookup, and base tables) with operational system data by executing the source-ETL that you have written.
  2. Refresh Oracle Utilities Data Model derived tables.
  3. Create a session by connecting oudm_user user through SQLPLUS. An intra-ETL process must be in 'RUNNING' status now:
    sqlplus oudm_user/oudm_user@SID
    
  4. Refresh Oracle Utilities Data Model aggregate materialized views by executing following commands:
SQL> DECLARE
  p_process_no    NUMBER;
  l_status        VARCHAR2(20);
BEGIN
l_status := OUDM_SYS.PKG_INTRA_ETL_UTIL.Refresh_MV('DWA_END_DVC_EVT_DVC_MO',p_process_no);
l_status := OUDM_SYS.PKG_INTRA_ETL_UTIL.Refresh_MV('DWA_END_DVC_EVT_CUST_MO',p_process_no);
l_status :=   OUDM_SYS.PKG_INTRA_ETL_UTIL.Refresh_MV('DWA_MTR_RDNG_MO',p_process_no);
l_status := OUDM_SYS.PKG_INTRA_ETL_UTIL.Refresh_MV('DWA_MTR_RDNG_TOU_MO',p_process_no);
l_status := OUDM_SYS.PKG_INTRA_ETL_UTIL.Refresh_MV('DWA_OUTG_MO',p_process_no);
l_status := OUDM_SYS.PKG_INTRA_ETL_UTIL.Refresh_MV('DWA_MTR_RDNG_MO_ACCT',p_process_no);
l_status := OUDM_SYS.PKG_INTRA_ETL_UTIL.Refresh_MV('DWA_MTR_RDNG_MO_CUST',p_process_no);
l_status := OUDM_SYS.PKG_INTRA_ETL_UTIL.Refresh_MV('DWA_MTR_RDNG_MO_UP',p_process_no); 
END;
/

Refreshing Oracle Utilities Data Model OLAP Cubes

On a scheduled basis you must update the OLAP cube data with the relational data that has been added to the Oracle Utilities Data Model data warehouse since the initial load of the OLAP cubes.

Refreshing the OLAP Cubes in an Oracle Utilities Data Model is a multi-step process:

  1. Refresh the foundation layer of the Oracle Utilities Data Model warehouse (that is, the reference, lookup, and base tables) with operational system data by executing the source-ETL that you have written.
  2. Refresh Oracle Utilities Data Model derived tables.
  3. Refresh Oracle Utilities Data Model aggregate materialized views.
  4. Update the parameters of the DWC_OLAP_ETL_PARAMETER control table.

    See Also:

    For more information on DWC_OLAP_ETL_PARAMETER control table, see Oracle Utilities Data Model Reference.

  5. Create a session by connecting oudm_user user through SQLPLUS. An intra-ETL process created in "Refreshing Oracle Utilities Data Model Derived Tables" must be in 'RUNNING' status:
    sqlplus oudm_user/oudm_user@SID
    
  6. Refresh Oracle Utilities Data Model OLAP cubes by executing following commands:
    SQL> DECLARE
      l_build_methd OUDM_SYS.DWC_OLAP_ETL_PARAMETER.BUILD_METHOD%TYPE;
      l_cube_nm OUDM_SYS.DWC_OLAP_ETL_PARAMETER.CUBENAME%TYPE;
      l_maxjobques OUDM_SYS.DWC_OLAP_ETL_PARAMETER.MAXJOBQUEUES%TYPE;
      l_calc_fcst  OUDM_SYS.DWC_OLAP_ETL_PARAMETER.CALC_FCST%TYPE;
      l_no_fcst_yrs OUDM_SYS.DWC_OLAP_ETL_PARAMETER.NO_FCST_YRS%TYPE;
      l_fcst_mthd  OUDM_SYS.DWC_OLAP_ETL_PARAMETER.FCST_MTHD%TYPE;
      l_fcst_st_yr OUDM_SYS.DWC_OLAP_ETL_PARAMETER.FCST_ST_YR%TYPE;
      l_fcst_end_yr OUDM_SYS.DWC_OLAP_ETL_PARAMETER.FCST_END_YR%TYPE;
      l_status VARCHAR2(20);
    BEGIN
      /***************  Fetching the values of the OLAP ETL parameters variable used in this procedure  ****************/
          SELECT
             BUILD_METHOD l_build_methd,
            CUBENAME l_cube_nm,
            MAXJOBQUEUES l_maxjobques,
            CALC_FCST l_calc_fcst,
            NO_FCST_YRS l_no_fcst_yrs,
            FCST_MTHD l_fcst_mthd,
            FCST_ST_YR l_fcst_st_yr,
            FCST_END_YR l_fcst_end_yr
          INTO
            l_build_methd,
            l_cube_nm,
            l_maxjobques,
            l_calc_fcst,
            l_no_fcst_yrs,
            l_fcst_mthd,
            l_fcst_st_yr,
            l_fcst_end_yr
          FROM
          OUDM_SYS.DWC_OLAP_ETL_PARAMETER;
       l_status := OUDM_SYS.PKG_OUDM_OLAP_ETL_AW_LOAD.olap_etl_aw_build(l_build_methd,l_cube_nm,l_maxjobques,l_calc_fcst,l_no_fcst_yrs,l_fcst_mthd,l_fcst_st_yr,l_fcst_end_yr,null,null);
    END;
    /
    
  7. If there is requirement to refresh only Oracle Utilities Data Model OLAP cubes, the same can be achieved with step 6, but before that make sure an intra-ETL process is already running. If no intra-ETL process is running, start one:
    sqlplus oudm_user/oudm_user@SID
     
    SQL> DECLARE
      l_process_type  OUDM_SYS.DWC_INTRA_ETL_PROCESS.PROCESS_TYPE%TYPE;
      l_error_text    OUDM_SYS.DWC_MESSAGE.MESSAGE_TEXT%TYPE;
      l_process_no    NUMBER;
    BEGIN
      l_process_no := OUDM_SYS.PKG_INTRA_ETL_UTIL.Start_Process(l_process_type,l_error_text);
    END;
    /
    

Refreshing Oracle Utilities Data Model Data Mining Models

Refreshing of data mining models is not integrated into intra-ETL workflow. There is only one data mining model, Customer Savings and Customer Profile by DR Program.

For each DR program, a mining model is trained. If you want to rebuild the mining models for all DR programs, perform the following steps:

  1. Make sure an intra-ETL process is running. If not, start an intra-ETL process:
    sqlplus oudm_user/oudm_user@SID
     
    SQL> DECLARE
      l_process_type  OUDM_SYS.DWC_INTRA_ETL_PROCESS.PROCESS_TYPE%TYPE;
      l_error_text    OUDM_SYS.DWC_MESSAGE.MESSAGE_TEXT%TYPE;
      l_process_no    NUMBER;
    BEGIN
      l_process_no := OUDM_SYS.PKG_INTRA_ETL_UTIL.Start_Process(l_process_type,l_error_text);
    END;
    /
    
  2. Build (rebuild in case already built) mining model by executing the following commands:
    sqlplus oudm_user/oudm_user@SID
     
    DECLARE
    --Indicates whether a model already built for a DR program can be rebuilt
    l_model_refresh_ind CHAR(1) := 'Y';
    l_model_exist_ind   CHAR(1);
    BEGIN
      FOR rec IN (SELECT DEMAND_RESPN_PROG_KEY FROM OUDM_SYS.DWR_DEMAND_RESPN_PROG WHERE END_DT < OUDM_SYS.PKG_INTRA_ETL_UTIL.Get_End_Date)
      LOOP
        SELECT COUNT(*) INTO l_model_exist_ind
        FROM OUDM_SYS.USER_MINING_MODELS
        WHERE MODEL_NAME = 'OUDM_PROFILE_KMEANS_' || rec.DEMAND_RESPN_PROG_KEY
        ;
        
    IF l_model_exist_ind = 1
    THEN
         IF l_model_refresh_ind = 'Y'
         THEN
         OUDM_SYS.PKG_DWD_CUST_DR_PROG_PROFILE.loaddata(rec.DEMAND_RESPN_PROG_KEY);        OUDM_SYS.PKG_MINING_ETL.crt_cust_sgmnt_src_view(rec.DEMAND_RESPN_PROG_KEY);        OUDM_SYS.PKG_OUDM_MINING.crt_frst_step_cust_sgmnt_model(rec.DEMAND_RESPN_PROG_KEY);
         ELSE
    DBMS_OUTPUT.PUT_LINE('NOTE:Chosen to not refresh the existing model :: ' || 'OUDM_PROFILE_KMEANS_' || rec.DEMAND_RESPN_PROG_KEY || ' :: set l_model_refresh_ind to ''Y'' to refresh existing models');
      END IF;
     
        ELSE
    OUDM_SYS.PKG_DWD_CUST_DR_PROG_PROFILE.loaddata(rec.DEMAND_RESPN_PROG_KEY);      OUDM_SYS.PKG_MINING_ETL.crt_cust_sgmnt_src_view(rec.DEMAND_RESPN_PROG_KEY);      OUDM_SYS.PKG_OUDM_MINING.crt_frst_step_cust_sgmnt_model(rec.DEMAND_RESPN_PROG_KEY);
        
        END IF;
        END LOOP;
      END;
      /
    

Managing Errors During Oracle Utilities Data Model Intra-ETL Execution

Describes the steps to identify and manage errors during intra-ETL execution.

Monitoring the Execution of the Intra-ETL Process

Three oudm_sys schema control tables, DWC_INTRA_ETL_PROCESS, DWC_INTRA_ETL_ACTIVITY, DWC_OLAP_ACTIVITY monitor the execution of the intra-ETL process.

You can access these three tables from ­oudm_user user.

Each normal run (as opposed to an error-recovery run) of a separate intra-ETL execution performs the following steps:

  1. Inserts a record into the DWC_INTRA_ETL_PROCESS table with a monotonically increasing system generated unique process key, SYSDATE as process start time, RUNNING as the process status, and an input date range in the FROM_DATE_ETL and TO_DATE_ETL columns.
  2. Invokes each of the individual intra-ETL programs in the appropriate order of dependency. Before the invocation of each program, the procedure inserts a record into the intra-ETL Activity detail table, DWC_INTRA_ETL_ACTIVITY, with values for:

    ACTIVITY_KEY, a system generated unique activity key.

    PROCESS_KEY, the process key value corresponding to the intra-ETL process.

    ACTIVITY_NAME, an individual program name.

    ACTIVITY_DESC, a suitable activity description.

    ACTIVITY_START_TIME, the value of SYSDATE.

    ACTIVITY_STATUS, the value of RUNNING.

  3. Updates the corresponding record in the DWC_INTRA_ETL_ACTIVITY table for the activity end time and activity status after the completion of each individual ETL program (either successfully or with errors). For successful completion of the activity, the procedure updates the status as 'COMPLETED-SUCCESS'. When an error occurs, the procedure updates the activity status as 'COMPLETED-ERROR', and also updates the corresponding error detail in the ERROR_DTL column.
  4. Updates the record corresponding to the process in the DWC_INTRA_ETL_ PROCESS table for the process end time and status, after the completion of all individual intra-ETL programs. When all the individual programs succeed, the procedure updates the status to 'COMPLETED-SUCCESS'; otherwise it updates the status to 'COMPLETED-ERROR'.
  5. For OLAP cubes loading, a record is inserted into DWC_OLAP_ACTIVITY table with CUBENAME as cube name, status as 'RUNNING', and LOAD_START_DT as SYSDATE for each cube. It updates the record upon the completion of cube loading. It updates STATUS column to 'COMPLETED-SUCCESS' if cube loading is successful, otherwise 'COMPLETE-ERROR' and updates LOAD_END_DT column to SYSDATE. In case of 'COMPLETED-ERROR' cubes, it also updates ERROR_DTL column with error details.

You can monitor the execution state of the intra-ETL, including current process progress, time taken by individual programs, or the complete process, by viewing the contents of the DWC_INTRA_ETL_PROCESS, DWC_INTRA_ETL_ACTIVITY, and DWC_OLAP_ACTIVITY tables. In DWC_INTRA_ETL_ACTIVITY table, see the records of currently running processes. Monitoring can be done both during and after the execution of the intra-ETL procedure.

Recovering an Intra ETL Process

Describes the process to recover an intra-ETL process.

To recover an intra-ETL process:

  1. Identify the errors by looking at the corresponding error details that are tracked against the individual programs in the DWC_INTRA_ETL_ACTIVITY table.
  2. Identify errors of OLAP cubes loading for individual cubes in DWC_OLAP_ACTIVITY table.
  3. Correct the causes of the errors.
  4. Re-invoke the intra-ETL process.

The intra-ETL workflow process identifies whether it is a normal run or recovery run by referring the DWC_INTRA_ETL_ACTIVITY table. During a recovery run, the intra-ETL workflow executes only the necessary programs. For example, for a derived population error as a part of the previous run, this recovery run executes the individual derived population programs which produced errors in the previous run. After their successful completion, the run refreshes aggregate materialized views in the appropriate order.

In this way, the intra-ETL error recovery is almost transparent, without involving the data warehouse or ETL administrator. The administrator must only correct the causes of the errors and re-invoke the intra-ETL process. The intra-ETL process identifies and executes the programs that generated errors.