22 Understanding Performance and Advanced ETL Concepts

Use this chapter as a guide for creating ETL logic that meets your performance expectations.

This chapter includes the following topics:

Best Practices for Designing PL/SQL Mappings

This section addresses PL/SQL mapping design and includes:

Warehouse Builder generates code for PL/SQL mappings that meet the following criteria:

  • The output code of each operator satisfies the input code requirement of its next downstream operator.

  • If the mapping contains an operator that generates only PL/SQL output, all downstream dataflow operators must also be implemented by PL/SQL. You can use SQL operators in such a mapping only after loading the PL/SQL output to a target.

As you design a mapping, you can evaluate its validity by taking note of the input and output code types for each operator in the mapping.

For example, you can see that the mapping in Figure 22-1 is invalid because the Match-Merge operator MM generates PL/SQL output but the subsequent Join operator accepts SQL input only.

Figure 22-1 Mapping Violates Input Requirement for Join Operator

This illustration is described in the surrounding text.
Description of "Figure 22-1 Mapping Violates Input Requirement for Join Operator"

To achieve the desired results for the mapping, consider joining the source tables before performing the Match-Merge or loading the results from the Match-Merge to a staging table before performing the join.

Figure 22-2 displays a mapping in which source tables are joined before the Match-Merge. Figure 22-3 displays a mapping in which the results from the Match-Merge are loaded into a staging table before performing the join.

Figure 22-2 Valid Mapping Design with Sources Joined Before Match-Merge

This illustration is described in the surrounding text.
Description of "Figure 22-2 Valid Mapping Design with Sources Joined Before Match-Merge"

Figure 22-3 Valid Mapping Design with Staging Table

This illustration is described in the surrounding text.
Description of "Figure 22-3 Valid Mapping Design with Staging Table"

Table 22-1 and Table 22-2 list the implementation types for each Warehouse Builder operator. These tables also indicate whether or not PL/SQL code includes the operation associated with the operator in the cursor. This information is relevant in determining which operating modes are valid for a given mapping design. It also determines what auditing details are available during error handling.

Table 22-1 Source-Target Operators Implementation in PL/SQL Mappings

Operator Implementation Types Valid in Set Based Mode Valid in Row Based Mode Valid in Row Based (Target Only)

Source Operators: Tables, Dimensions, Cubes, Views, External Tables.

SQL

Yes

Yes

Yes. Part of cursor.

Target Operators: Tables, Dimensions, Cubes, Views,

SQL

PL/SQL

Yes, except when loading= UPDATE and database is not 10g or higher.

Yes

Yes. Not part of cursor.

Flat File as source

For PL/SQL, create an external table.

Yes

Yes

Yes. Part of the cursor.

Flat File as target

SQL

Yes, except when loading = DELETE or loading= UPDATE and database is not 10g or higher.

Yes

Yes. Not part of cursor.

Sequence as source

SQL

Yes

Yes

Yes, part of cursor.


Table 22-2 Data Flow Operator Implementation in PL/SQL Mappings

Operator Name Implementation Types Valid in Set Based Mode Valid in Row Based Mode Valid in Row Based (Target Only) Mode

Aggregator

SQL

Yes

Yes, only if part of the cursor.

Yes, only if part of the cursor.

Constant Operator

PL/SQL

SQL

Yes

Yes

Yes

Data Generator

SQL*Loader Only

N/A

N/A

N/A

Deduplicator

SQL

Yes

Yes, only if part of the cursor

Yes, only if part of the cursor.

Expression

SQL

PL/SQL

Yes

Yes

Yes

Filter

SQL

PL/SQL

Yes

Yes

Yes

Joiner

SQL

Yes

Yes, only if part of the cursor.

Yes, only if part of the cursor.

Key Lookup

SQL

Yes

Yes, only if part of the cursor.

Yes, only if part of the cursor.

Mapping Input Parameter

SQL

PL/SQL

Yes

Yes

Yes

Mapping Output Parameter

SQL

PL/SQL

Yes

Yes

Yes

Match-Merge

SQL input

PL/SQL output

(PL/SQL input from XREF group only)

No

Yes

Yes. Not part of cursor.

Name and Address

PL/SQL

No

Yes

Yes. Not part of cursor.

Pivot

SQL

PL/SQL

Yes

Yes

Yes

Post-Mapping Process

Irrelevant

Yes, independent of dataflow.

Yes

Yes

Pre-Mapping Process

Irrelevant

Yes, independent of dataflow.

Yes

Yes

Set

SQL

Yes

Yes, only if part of the cursor.

Yes, only if part of the cursor.

Sorter

SQL

Yes

Yes, only if part of the cursor.

Yes, as part of the cursor.

Splitter

SQL

PL/SQL

Yes

Yes

Yes

Table Function

SQL or PL/SQL input

SQL output only

Yes

Yes

Yes

Transformation as a procedure

PL/SQL

No

Yes

Yes. Not part of cursor.

Transformation as a function that does not perform DML

SQL

PL/SQL

Yes

Yes

Yes, included in the cursor.


Set Based Versus Row Based Operating Modes

For mappings with a PL/SQL implementation, select one of the following operating modes:

The default operating mode you select depends upon the performance you expect, the amount of auditing data you require, and how you design the mapping. Mappings have at least one and as many as three valid operating modes, excluding the options for failing over to row based modes. During code generation, Warehouse Builder generates code for the specified default operating mode as well as the deselected modes. Therefore, at runtime, you can select to run in the default operating mode or any one of the other valid operating modes.

The types of operators in the mapping may limit the operating modes you can select. As a general rule, mappings run in set based mode can include any of the operators except for Match-Merge, Name and Address, and Transformations used as procedures. Although you can include any of the operators in row based and row based (target only) modes, there are important restrictions on how you use SQL based operators such as Aggregators, Joins, and Key Lookups. To use SQL based operators in either of the row based modes, ensure that the operation associated with the operator can be included in the cursor.

These general rules are explained in the following sections.

Set Based

In set based mode, Warehouse Builder generates a single SQL statement that processes all data and performs all operations. Although processing data as a set improves performance, the auditing information available is limited. Runtime auditing is limited to reporting of the execution error only. With set based mode, you cannot identify the rows that contain errors.

Figure 22-4 shows a simple mapping and the associated logic Warehouse Builder uses to generate code for the mapping when run in set based operating mode. TAB1, FLTR, and TAB2 are processed as a set using SQL.

Figure 22-4 Simple Mapping Run in Set Based Mode

This illustration is described in the surrounding text.
Description of "Figure 22-4 Simple Mapping Run in Set Based Mode"

To correctly design a mapping for the set based mode, avoid operators that require row by row processing such as Match-Merge and Name and Address operators. If you include an operator in the dataflow that cannot be performed in SQL, Warehouse Builder does not generate set based code and displays an error when you execute the package in set based mode.

For target operators in a mapping, the loading types INSERT/UPDATE and UPDATE/INSERT are always valid for set based mode. Warehouse Builder supports UPDATE loading in set based mode only when the Oracle Database is 10g or higher. Warehouse Builder also supports the loading type DELETE in set based mode. For a complete listing of how Warehouse Builder handles operators in set based mappings, see Table 22-2.

Row Based

In row based mode, Warehouse Builder generates statements that process data row by row. The select statement is in a SQL cursor. All subsequent statements are PL/SQL. You can access full runtime auditing information for all operators performed in PL/SQL and only limited information for operations performed in the cursor.

Figure 22-5 shows a simple mapping and the associated logic Warehouse Builder uses to generate code for the mapping when run in row based operating mode. TAB1 is included in the cursor and processed as a set using SQL. FLTR and TAB2 are processed row by row using PL/SQL.

Figure 22-5 Simple Mapping Run in Row Based Mode

This illustration is described in the surrounding text.
Description of "Figure 22-5 Simple Mapping Run in Row Based Mode"

If the mapping includes any SQL based operators that cannot be performed in PL/SQL, Warehouse Builder attempts to generate code with those operations in the cursor. To generate valid row based code, design your mapping such that if you include any of the following SQL based operators, Warehouse Builder can include the operations in the cursor:

  • Aggregation

  • Deduplicator

  • Join

  • Key Lookup

  • Sequence

  • Set

  • Sorter

For the preceding operators to be included in the cursor, do not directly precede it by an operator that generates PL/SQL code. In other words, you cannot run the mapping in row-based mode if it contains a Transformation implemented as procedure, a Flat File used as a source, a Match-Merge, or Name and Address operator directly followed by any of the seven SQL based operators. For the design to be valid, include a staging table between the PL/SQL generating operator and the SQL based operator.

Row Based (Target Only)

In row based (target only) mode, Warehouse Builder generates a cursor select statement and attempts to include as many operations as possible in the cursor. For each target, Warehouse Builder inserts each row into the target separately. You can access full runtime auditing information for all operators performed in PL/SQL and only limited information for operations performed in the cursor. Use this mode when you expect fast set based operations to extract and transform the data but need extended auditing for loading the data, which is where errors are likely to occur.

Table 22-2 shows a simple mapping and the associated logic Warehouse Builder uses to generate code for the mapping when run in row based (target only) operating mode. TAB1 and FLTR are included in the cursor and processed as a set using SQL. TAB2 is processed row by row.

Figure 22-6 Simple Mapping Run in Row Based (Target Only) Mode

This illustration is described in the surrounding text.
Description of "Figure 22-6 Simple Mapping Run in Row Based (Target Only) Mode"

Row based (target only) places the same restrictions on SQL based operators as the row based operating mode. Additionally, for mappings with multiple targets, Warehouse Builder generates code with a cursor for each target.

About Committing Data in Warehouse Builder

There are two major approaches to committing data in Warehouse Builder. You can commit or rollback data based on the mapping design. To do this, use one of the commit control methods described in "Committing Data Based on Mapping Design".

Alternatively, for PL/SQL mappings, you can commit or rollback data independently of the mapping design. Use a process flow to commit the data or establish your own method as described in "Committing Data Independently of Mapping Design".

Committing Data Based on Mapping Design

By default, Warehouse Builder loads and then automatically commits data based on the mapping design. For PL/SQL mappings you can override the default setting and control when and how Warehouse Builder commits data. You have the following options for committing data in mappings:

Automatic: This is the default setting and is valid for all mapping types. Warehouse Builder loads and then automatically commits data based on the mapping design. If the mapping has multiple targets, Warehouse Builder commits and rolls back each target separately and independently of other targets. Use the automatic commit when the consequences of multiple targets being loaded unequally are not great or are irrelevant.

Automatic Correlated: Automatic correlated commit is a specialized type of automatic commit that applies to PL/SQL mappings with multiple targets only. Warehouse Builder considers all targets collectively and commits or rolls back data uniformly across all targets. Use the correlated commit when it is important to ensure that every row in the source impacts all affected targets uniformly. For more information about correlated commit, see "Committing Data from a Single Source to Multiple Targets".

Manual: Select manual commit control for PL/SQL mappings when you want to interject complex business logic, perform validations, or run other mappings before committing data. For examples, see "Embedding Commit Logic into the Mapping" and "Committing Data Independently of Mapping Design".

Committing Data from a Single Source to Multiple Targets

If you want to populate multiple targets based on a common source, you may also want to ensure that every row from the source impacts all affected targets uniformly.

Figure 22-7 shows a PL/SQL mapping that illustrates this case. The target tables all depend upon the source table. If a row from SOURCE causes changes in multiple targets, for instance TARGET_1 and TARGET_2, then Warehouse Builder should commit the appropriate data to both affected targets at the same time. If this relationship is not maintained when you run the mapping again, then the data can become inaccurate and possibly unusable.

Figure 22-7 Mapping with Multiple Targets Dependent on One Source

This illustration is described in the surrounding text.
Description of "Figure 22-7 Mapping with Multiple Targets Dependent on One Source"

If the number of rows from the source table is relatively small, maintaining the three targets may not be difficult. Manually maintaining targets dependent on a common source, however, becomes more tedious as you increase the number of rows from the source, or as you design more complex mappings with more targets and transformations.

To ensure that every row in the source properly impacts every target, configure the mapping to use the correlated commit strategy.

Using the Automatic Correlated Commit Strategy

In set based mode, correlated commit may impact the size of your rollback segments. Space for rollback segments may be a concern when you merge data (insert/update or update/insert).

Correlated commit operates transparently with PL/SQL bulk processing code.

The correlated commit strategy is not available for mappings run in any mode that are configured for Partition Exchange Loading or include an Advanced Queue, Match-Merge, or Table Function operator.

Automatic Commit versus Automatic Correlated Commit

The combination of the commit strategy and operating mode determines mapping behavior. Table 22-3 shows the valid combinations you can select.

Table 22-3 Valid Commit Strategies for Operating Modes

Operating Mode Automatic Correlated Commit Automatic Commit

Set based

Valid

Valid

Row based

Valid

Valid

Row based (target only)

Not Applicable

Valid


Correlated commit is not applicable for row based (target only). By definition, this operating mode places the cursor as close to the target as possible. In most cases, this results in only one target for each select statement and negates the purpose of committing data to multiple targets. If you design a mapping with the row based (target only) and correlated commit combination, Warehouse Builder runs the mapping but does not perform the correlated commit.

To understand the effects each operating mode and commit strategy combination has on a mapping, consider the mapping from Figure 22-7. Assume the data from source table equates to 1,000 new rows. When the mapping runs successfully, Warehouse Builder loads 1,000 rows to each of the targets. If the mapping fails to load the 100th new row to Target_2, you can expect the following results, ignoring the influence from other configuration settings such as Commit Frequency and Number of Maximum Errors:

  • Set based/ Correlated Commit: A single error anywhere in the mapping triggers the rollback of all data. When Warehouse Builder encounters the error inserting into Target_2, it reports an error for the table and does not load the row. Warehouse Builder rolls back all the rows inserted into Target_1 and does not attempt to load rows to Target_3. No rows are added to any of the target tables. For error details, Warehouse Builder reports only that it encountered an error loading to Target_2.

  • Row based/ Correlated Commit: Beginning with the first row, Warehouse Builder evaluates each row separately and loads it to all three targets. Loading continues in this way until Warehouse Builder encounters an error loading row 100 to Target_2. Warehouse Builder reports the error and does not load the row. It rolls back the row 100 previously inserted into Target_1 and does not attempt to load row 100 to Target_3. Next, Warehouse Builder continues loading the remaining rows, resuming with loading row 101 to Target_1. Assuming Warehouse Builder encounters no other errors, the mapping completes with 999 new rows inserted into each target. The source rows are accurately represented in the targets.

  • Set based/ Automatic Commit: When Warehouse Builder encounters the error inserting into Target_2, it does not load any rows and reports an error for the table. It does, however, continue to insert rows into Target_3 and does not roll back the rows from Target_1. Assuming Warehouse Builder encounters no other errors, the mapping completes with one error message for Target_2, no rows inserted into Target_2, and 1,000 rows inserted into Target_1 and Target_3. The source rows are not accurately represented in the targets.

  • Row based/Automatic Commit: Beginning with the first row, Warehouse Builder evaluates each row separately for loading into the targets. Loading continues in this way until Warehouse Builder encounters an error loading row 100 to Target_2 and reports the error. Warehouse Builder does not roll back row 100 from Target_1, does insert it into Target_3, and continues to load the remaining rows. Assuming Warehouse Builder encounters no other errors, the mapping completes with 999 rows inserted into Target_2 and 1,000 rows inserted into each of the other targets. The source rows are not accurately represented in the targets.

Embedding Commit Logic into the Mapping

For PL/SQL mappings only, you can embed commit logic into the mapping design by adding a pre or post mapping operator with SQL statements to commit and rollback data. When you run the mapping, Warehouse Builder commits or rollback data based solely on the SQL statements you provide in the pre or post mapping operator.

Use these instructions to implement a business rule that is tedious or impossible to design given existing Warehouse Builder mapping operators. For example, you may want to verify the existence of a single row in a target. Write the required logic in SQL and introduce that logic to the mapping through a pre or post mapping operator.

To include commit logic in the mapping design:

  1. Design the mapping to include a pre or post mapping operator. Use one of these operators to introduce commit and rollback SQL statements.

  2. Configure the mapping with Commit Control set to Manual.

    In the Project Explorer, right-click the mapping and select Configure. Under Code Generation Options, select Commit Control to Manual.

    To understand the implications of selecting to commit data manually, refer to "About Manual Commit Control".

  3. Deploy the mapping.

  4. Run the mapping.

    Warehouse Builder executes the mapping but does not commit data until processing the commit logic you wrote in the Pre-Mapping Process or Post-Mapping Process operator.

Committing Data Independently of Mapping Design

You may want to commit data independently of the mapping design for any of the following reasons:

  • Running Multiple Mappings Before Committing Data: You may want to run multiple mappings without committing data until successfully running and validating all mappings. This can be the case when you have separate mappings for loading dimensions and cubes.

  • Maintaining targets more efficiently: If incorrect data is loaded and committed to a very large target, it can be difficult and time consuming to repair the damage. To avoid this, first check the data and then decide whether to issue a commit or rollback command.

The first step to achieve these goals is to configure the mapping with commit control set to Manual.

About Manual Commit Control

Manual commit control enables you to specify when Warehouse Builder commits data regardless of the mapping design. Manual commit control does not affect auditing statistics. This means that you can view the number of rows inserted and other auditing information before issuing the commit or rollback command.

When using manual commit, be aware that this option may have performance implications. Mappings that you intend to run in parallel maybe be executed serially if the design requires a target to be read after being loaded. This occurs when moving data from a remote source or loading to two targets bound to the same table.

When you enable manual commit control, Warehouse Builder runs the mapping with PEL switched off.

Running Multiple Mappings Before Committing Data

This section provides two sets of instructions for committing data independent of the mapping design. The first set describes how to run mappings and then commit data in a SQL*Plus session. Use these instructions to test and debug your strategy of running multiple mappings and then committing the data. Then, use the second set of instructions to automate the strategy.

Both sets of instructions rely upon the use of the main procedure generated for each PL/SQL mapping.

Main Procedure

The main procedure is a procedure that exposes the logic for starting mappings in Warehouse Builder. You can employ this procedure in PL/SQL scripts or use it in interactive SQL*Plus sessions.

When you use the main procedure, you must specify one required parameter, p_status. And you can optionally specify other parameters relevant to the execution of the mapping as described in Table 22-4. Warehouse Builder uses the default setting for any optional parameters that you do not specify.

Table 22-4 Parameter for the Main Procedure

Parameter Name Description

p_status

Use this required parameter to write the status of the mapping upon completion. It operates in conjunction with the predefined variable called status.

The status variable is defined such that OK indicates the mapping completed without errors. OK_WITH_WARNINGS indicates the mapping completed with user errors. FAILURE indicates the mapping encountered a fatal error.

p_operating_mode

Use this optional parameter to pass in the default operating mode such as SET_BASED.

p_bulk_size

Use this optional parameter to pass in the bulk size.

p_audit_level

Use this optional parameter to pass in the default audit level such as COMPLETE.

p_max_no_of_errors

Use this optional parameter to pass in the permitted maximum number of errors.

p_commit_frequency

Use this optional parameter to pass in the commit frequency.


Committing Data at Runtime

For PL/SQL mappings alone, you can run mappings and issue commit and rollback commands from the SQL*Plus session. Based on your knowledge of SQL*Plus and the Main Procedure, you can manually run and validate multiple mappings before committing data.

To commit data manually at runtime:

  1. Design the PL/SQL mappings. For instance, create one mapping to load dimensions and a separate mapping to load cubes.

    These instructions are not valid for SQL*Loader and ABAP mappings.

  2. Configure both mappings with the Commit Control property set to Manual.

    In the Project Explorer, right-click the mapping and select Configure. Under the Code Generation Options, set the Commit Control property to Manual.

  3. Generate each mapping.

  4. From a SQL*Plus session, issue the following command to execute the first mapping called map1 in this example:

    var status VARCHAR2(30);
    execute map1.main(:status);
    

    The first line declares the predefined status variable described in Table 22-4. In the second line, p_status is set to the status variable. When map1 completes, SQL*Plus displays the mapping status such as OK.

  5. Execute the second mapping, in this example, the cubes mapping called map2.

    You can run the second in the same way you ran the previous map. Or, you can supply additional parameters listed in Table 22-4 to dictate how to run the map2 in this example:

    map2.main (p_status => :status,               \
               p_operating_mode => 'SET_BASED',   \
               p_audit_level => 'COMPLETE');
    
  6. Verify the results from the execution of the two mappings and send either the commit or rollback command.

  7. Automate your commit strategy as described in "Committing Mappings through the Process Flow Editor".

Committing Mappings through the Process Flow Editor

For PL/SQL mappings alone, you can commit or rollback mappings together. Based on your knowledge of the Sqlplus activity, the Main Procedure, and writing PL/SQL scripts, you can use process flows to automate logic that commits data after all mappings complete successfully or rollback the data if any mapping fails.

To commit multiple mappings through a process flow:

  1. Design the PL/SQL mappings.

    These instructions are not valid for SQL*Loader and ABAP mappings.

  2. Ensure each mapping is deployed to the same schema.

    All mappings must have their locations pointing to the same schema. You can achieve this by designing the mappings under the same target module. Or, for multiple target modules, ensure that the locations point to the same schema.

  3. Configure each mapping with the Commit Control property set to Manual.

    In the Project Explorer, right-click the mapping and select Configure. Under Code Generation Options, set the Commit Control property to Manual.

  4. Design a process flow using a sqlplus activity instead of multiple mapping activities.

    In typical process flows, you add a mapping activity for each mapping and the process flow executes an implicit commit after each mapping activity. However, in this design, do not add mapping activities. Instead, add a single sqlplus activity.

  5. Write a PL/SQL script that uses the main procedure to execute each mapping. The following script demonstrates how to run the next mapping only if the initial mapping succeeds.

    declare
        status VARCHAR2(30);
    begin
        map1.main(status);
        if status != 'OK' then
           rollback;
        else
           map2.main(status);
           if status != 'OK' then
               rollback;
           else
               commit;
           end if;
        end if;
    end;
    
  6. Paste your PL/SQL script into the sqlplus activity.

    In the editor explorer, select SCRIPT under the sqlplus activity and then double-click Value in the object inspector.

    Figure 22-8 displays the Explorer panel and the Object Inspector panel with SCRIPT selected.

    Figure 22-8 Specifying a Script in the Sqlplus Activity

    This illustration is described in the surrounding text.
    Description of "Figure 22-8 Specifying a Script in the Sqlplus Activity"

  7. Optionally apply a schedule to the process flow as described in "Process for Defining and Using Schedules".

  8. Deploy the mappings, process flow, and schedule if you defined one.

Ensuring Referential Integrity in PL/SQL Mappings

When you design mappings with multiple targets, you may want to ensure that Warehouse Builder loads the targets in a specific order. This is the case when a column in one target derives its data from another target.

To ensure referential integrity in PL/SQL mappings:

  1. Design a PL/SQL mapping with multiple targets.

  2. (Optional) Define a parent/child relationship between two of the targets by specifying a foreign key.

    A foreign key in the child table must refer to a primary key in the parent table. If the parent does not have a column defined as a primary key, you must add a column and define it as the primary key. For an example of how to do this, see "Using Conventional Loading to Ensure Referential Integrity in SQL*Loader Mappings".

  3. In the mapping properties, view the Target Load Order property by clicking the Ellipses button to the right of this property.

    If you defined a foreign key relationship in the previous step, Warehouse Builder calculates a default loading order that loads parent targets before children. If you did not define a foreign key, use the Target Load Order dialog box to define the loading order.

    For more information, see "Target Load Order".

  4. Ensure that the Use Target Load Ordering configuration property is set to its default value of true.

Best Practices for Designing SQL*Loader Mappings

This section includes the following topics:

Using Conventional Loading to Ensure Referential Integrity in SQL*Loader Mappings

If you are extracting data from a multiple-record-type file with a master-detail structure and mapping to tables, add a Mapping Sequence operator to the mapping to retain the relationship between the master and detail records through a surrogate primary key or foreign key relationship. A master-detail file structure is one where a master record is followed by its detail records. In Example 22-1, records beginning with "E" are master records with Employee information and records beginning with "P" are detail records with Payroll information for the corresponding employee.

Example 22-1 A Multiple-Record-Type Flat File with a Master-Detail Structure

E 003715 4 153 09061987 014000000 "IRENE HIRSH" 1 08500
P 01152000 01162000 00101 000500000 000700000
P 02152000 02162000 00102 000300000 000800000
E 003941 2 165 03111959 016700000 "ANNE FAHEY" 1 09900 
P 03152000 03162000 00107 000300000 001000000
E 001939 2 265 09281988 021300000 "EMILY WELLMET" 1 07700
P 01152000 01162000 00108 000300000 001000000
P 02152000 02162000 00109 000300000 001000000

In Example 22-1, the relationship between the master and detail records is inherent only in the physical record order: payroll records correspond to the employee record they follow. However, if this is the only means of relating detail records to their masters, this relationship is lost when Warehouse Builder loads each record into its target table.

Maintaining Relationships Between Master and Detail Records

You can maintain the relationship between master and detail records if both types of records share a common field. If Example 22-1 contains a field Employee ID in both Employee and Payroll records, you can use it as the primary key for the Employee table and as the foreign key in the Payroll table, thus associating Payroll records to the correct Employee record.

However, if your file does not have a common field that can be used to join master and detail records, you must add a sequence column to both the master and detail targets (see Table 22-5 and Table 22-6) to maintain the relationship between the master and detail records. Use the Mapping Sequence operator to generate this additional value.

Table 22-5 represents the target table containing the master records from the file in Example 22-1. The target table for the master records in this case contains employee information. Columns E1-E10 contain data extracted from the flat file. Column E11 is the additional column added to store the master sequence number. Notice that the number increments by one for each employee.

Table 22-5 Target Table Containing Master Records

E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11

E

003715

4

153

09061987

014000000

"IRENE

HIRSH"

1

08500

1

E

003941

2

165

03111959

016700000

"ANNE

FAHEY"

1

09900

2

E

001939

2

265

09281988

021300000

"EMILY

WELSH"

1

07700

3


Table 22-6 represents the target table containing the detail records from the file in Example 22-1. The target table for the detail records in this case contains payroll information, with one or more payroll records for each employee. Columns P1-P6 contain data extracted from the flat file. Column P7 is the additional column added to store the detail sequence number. Notice that the number for each payroll record matches the corresponding employee record in Table 22-5.

Table 22-6 Target Table Containing Detail Records

P1 P2 P3 P4 P5 P6 P7

P

01152000

01162000

00101

000500000

000700000

1

P

02152000

02162000

00102

000300000

000800000

1

P

03152000

03162000

00107

000300000

001000000

2

P

01152000

01162000

00108

000300000

001000000

3

P

02152000

02162000

00109

000300000

001000000

3


Extracting and Loading Master-Detail Records

This section contains instructions on creating a mapping that extracts records from a master-detail flat file and loads those records into two different tables. One target table stores master records and the other target table stores detail records from the flat file. The Mapping Sequence is used to maintain the master-detail relationship between the two tables.

Note:

These instructions are for conventional path loading. For instructions on using direct path loading for master-detail records, see "Using Direct Path Loading to Ensure Referential Integrity in SQL*Loader Mappings".

This procedure outlines general steps for building such a mapping. Additional detailed instructions are available at:

To extract from a master-detail flat file and maintain master-detail relationships, use the following steps:

  1. Import and sample the flat file source that consists of master and detail records.

    When naming the record types as you sample the file, assign descriptive names to the master and detail records. This makes it easier to identify those records in the future.

    Figure 22-9 shows the Flat File Sample Wizard for a multiple-record-type flat file containing department and employee information. The master record type (for employee records) is called EmployeeMaster, while the detail record type (for payroll information) is called PayrollDetail.

    Figure 22-9 Naming Flat File Master and Detail Record Types

    This illustration is described in the surrounding text.
    Description of "Figure 22-9 Naming Flat File Master and Detail Record Types "

  2. Drop a Flat File operator onto the Mapping Editor canvas and specify the master-detail file from which you want to extract data.

  3. Drop a Sequence operator onto the mapping canvas.

  4. Drop a Table operator for the master records onto the mapping canvas.

    You can either select an existing workspace table that you created earlier or create a new unbound table operator with no attributes. You can then map or copy all required fields from the master record of the file operator to the master table operator (creating columns) and perform an outbound reconciliation to define the table later.

    The table must contain all the columns required for the master fields you want to load plus an additional numeric column for loading sequence values.

  5. Drop a Table operator for the detail records onto the mapping canvas.

    You can either select an existing workspace table that you created earlier or create a new unbound table operator with no attributes. You can then map or copy all required fields from the master record of the file operator to the master table operator (creating columns) and perform an outbound synchronize to define the table later.

    The table must contain all the columns required for the detail fields you want to load plus an additional numeric column for loading sequence values.

  6. Map all of the necessary flat file master fields to the master table and detail fields to the detail table.

    Figure 22-10 displays the mapping of the fields.

  7. Map the Sequence NEXTVAL attribute to the additional sequence column in the master table.

    Figure 22-10 displays the mapping from the NEXTVAL attribute of the Sequence operator to the master table.

  8. Map the Sequence CURRVAL attribute to the additional sequence column in the detail table.

    Figure 22-10 shows a completed mapping with the flat file master fields mapped to the master target table, the detail fields mapped to the detail target table, and the NEXTVAL and CURRVAL attributes from the Mapping Sequence mapped to the master and detail target tables, respectively.

    Figure 22-10 Completed Mapping from Master-Detail Flat File to Two Target Tables

    This illustration is described in the surrounding text.
    Description of "Figure 22-10 Completed Mapping from Master-Detail Flat File to Two Target Tables"

  9. Configure the mapping that loads the source data into the target tables with the following parameters:

    Direct Mode: Not selected

    Errors Allowed: 0

    Row: 1

    Trailing Nullcols: True (for all tables)

Error Handling Suggestions

This section contains error handling recommendations for files with varying numbers of errors.

If your data file almost never contains errors:

  1. Create a mapping with a Sequence operator (see "Sequence Operator").

  2. Configure a mapping with the following parameters:

    Direct Mode= Not selected

    ROW=1

    ERROR ALLOWED = 0

  3. Generate the code and run an SQL*Loader script.

    If the data file has errors, then the loading stops when the first error occurs.

  4. Fix the data file and run the control file again with the following configuration values:

    CONTINUE_LOAD=TRUE

    SKIP=number of records already loaded

If your data file is likely to contain a moderate number of errors:

  1. Create a primary key (PK) for the master record based on the seq_nextval column.

  2. Create a foreign key (FK) for the detail record based on the seq_currval column which references the master table PK.

    In this case, master records with errors will be rejected with all their detail records. You can recover these records by following these steps.

  3. Delete all failed detail records that have no master records.

  4. Fix the errors in the bad file and reload only those records.

  5. If there are very few errors, you may choose to load the remaining records and manually update the table with correct sequence numbers.

  6. In the log file, you can identify records that failed with errors because those errors violate the integrity constraint. The following is an example of a log file record with errors:

    Record 9: Rejected - Error on table "MASTER_T", column "C3". 
    ORA-01722: invalid number 
    Record 10: Rejected - Error on table "DETAIL1_T". 
    ORA-02291: integrity constraint (SCOTT.FK_SEQ) violated - parent key not found
    Record 11: Rejected - Error on table "DETAIL1_T". 
    ORA-02291: integrity constraint (SCOTT.FK_SEQ) violated - parent key not found 
    Record 21: Rejected - Error on table "DETAIL2_T". 
    ORA-02291: invalid number  
    

If your data file always contains many errors:

  1. Load all records without using the Sequence operator.

    Load the records into independent tables. You can load the data in Direct Mode, with the following parameters that increase loading speed:

    ROW>1

    ERRORS ALLOWED=MAX

  2. Correct all rejected records.

  3. Reload the file again with a Sequence operator (see "Sequence Operator").

Subsequent Operations

After the initial loading of the master and detail tables, you can use the loaded sequence values to further transform, update, or merge master table data with detail table data. For example, if your master records have a column that acts as a unique identifier, such as an Employee ID, and you want to use it as the key to join master and detail rows (instead of the sequence field you added for that purpose), you can update the detail tables to use this unique column. You can then drop the sequence column you created for the purpose of the initial load. Operators such as the Aggregator, Filter, or Match and Merge operator can help you with these subsequent transformations.

Using Direct Path Loading to Ensure Referential Integrity in SQL*Loader Mappings

If you are using a master-detail flat file where the master record has a unique field (or if the concatenation of several fields can result in a unique identifier), you can use Direct Path Load as an option for faster loading.

For direct path loading, the record number (RECNUM) of each record is stored in the master and detail tables. A post-load procedure uses the RECNUM to update each detail row with the unique identifier of the corresponding master row.

This procedure outlines general steps for building such a mapping. Additional detailed instructions are available:

To extract from a master-detail flat file using direct path load to maintain master-detail relationships:

  1. Import and sample a flat file source that consists of master and detail records.

    When naming the record types as you sample the file, assign descriptive names to the master and detail records, as shown in Figure 22-9. This will make it easier to identify those records in the future.

  2. Create a mapping that you will use to load data from the flat file source.

  3. Drop a Flat File operator onto the mapping canvas and specify the master-detail file from which you want to extract data.

  4. Drop a Data Generator and a Constant operator onto the mapping canvas.

  5. Drop a Table operator for the master records onto the mapping canvas.

    You can either select an existing workspace table that you created earlier, or create a new unbound table operator with no attributes and perform an outbound synchronize to define the table later.

    The table must contain all the columns required for the master fields you plan to load plus an additional numeric column for loading the RECNUM value.

  6. Drop a Table operator for the detail records onto the mapping canvas.

    You can either select an existing workspace table that you created earlier, or create a new unbound table operator with no attributes and perform an outbound synchronize to define the table later.

    The table must contain all the columns required for the detail fields you plan to load plus an additional numeric column for loading a RECNUM value, and a column that will be updated with the unique identifier of the corresponding master table row.

  7. Map all of the necessary flat file master fields to the master table and detail fields to the detail table, as shown in Figure 22-12.

  8. Map the Data Generator operator's RECNUM attribute to the RECNUM columns in the master and detail tables, as shown in Figure 22-12.

  9. Add a constant attribute in the Constant operator.

    If the master row unique identifier column is of a CHAR data type, make the constant attribute a CHAR type with the expression '*'.

    If the master row unique identifier column is a number, make the constant attribute a NUMBER with the expression '0'.

    Figure 22-11 shows the expression property of the constant attribute set to '0'. This constant marks all data rows as "just loaded".

    Figure 22-11 Constant Operator Properties

    This illustration is described in the surrounding text.
    Description of "Figure 22-11 Constant Operator Properties"

  10. Map the constant attribute from the Constant operator to the detail table column that will later store the unique identifier for the corresponding master table record.

    Figure 22-12 shows a completed mapping with the flat file's master fields mapped to the master target table, the detail fields mapped to the detail target table, the RECNUM attributes from the Data Generator operator mapped to the master and detail target tables, respectively, and the constant attribute mapped to the detail target table.

    Figure 22-12 Completed Mapping from Master-Detail Flat File with a Direct Path Load

    This illustration is described in the surrounding text.
    Description of "Figure 22-12 Completed Mapping from Master-Detail Flat File with a Direct Path Load"

  11. Configure the mapping with the following parameters:

    Direct Mode: True

    Errors Allowed: 0

    Trailing Nullcols: True (for each table)

  12. After you validate the mapping and generate the SQL*Loader script, create a post-update PL/SQL procedure and add it to the Warehouse Builder library.

  13. Run the SQL*Loader script.

  14. Execute an UPDATE SQL statement by running a PL/SQL post-update procedure or manually executing a script.

The following is an example of the generated SQL*Loader control file script:

OPTIONS ( DIRECT=TRUE,PARALLEL=FALSE, ERRORS=0, BINDSIZE=50000, ROWS=200, READSIZE=65536)
LOAD DATA
CHARACTERSET WE8MSWIN1252
  INFILE 'g:\FFAS\DMR2.dat'
  READBUFFERS 4
  INTO TABLE "MATER_TABLE"
  APPEND
  REENABLE DISABLED_CONSTRAINTS
       WHEN 
        "REC_TYPE"='P'
  FIELDS
    TERMINATED BY ','
                OPTIONALLY ENCLOSED BY '"'
                TRAILING NULLCOLS
    
                (
                "REC_TYPE" POSITION (1) CHAR ,
                "EMP_ID" CHAR ,
                "ENAME" CHAR ,
                "REC_NUM" RECNUM
                )
  
INTO TABLE "DETAIL_TABLE"
                APPEND
                REENABLE DISABLED_CONSTRAINTS
                WHEN 
                "REC_TYPE"='E'
                FIELDS
                TERMINATED BY ','
                OPTIONALLY ENCLOSED BY '"'
                TRAILING NULLCOLS
        (
                "REC_TYPE" POSITION (1) CHAR ,
                "C1" CHAR ,
                "C2" CHAR ,
                "C3" CHAR ,
                "EMP_ID" CONSTANT '*',
         "REC_NUM" RECNUM

The following is an example of the post-update PL/SQL procedure:

  create or replace procedure wb_md_post_update( 
     master_table varchar2 
    ,master_recnum_column varchar2 
    ,master_unique_column varchar2 
    ,detail_table varchar2 
    ,detail_recnum_column varchar2 
    ,detail_masterunique_column varchar2 
    ,detail_just_load_condition varchar2) 
  IS 
     v_SqlStmt VARCHAR2(1000); 
  BEGIN 
     v_SqlStmt := 'UPDATE '||detail_table||' l '|| 
                  ' SET l.'||detail_masterunique_column||' = (select i.'||master_unique_column|| 
                  ' from '||master_table||' i '|| 
                  ' WHERE i.'||master_recnum_column||' IN '|| 
                  ' (select max(ii.'||master_recnum_column||') '|| 
                  ' from '||master_table||' ii '|| 
                  ' WHERE ii.'||master_recnum_column||' < l.'||detail_recnum_column||') '|| 
                  ' ) '|| 
                  ' WHERE l.'||detail_masterunique_column||' = '||''''||detail_just_load_condition||''''; 
     dbms_output.put_line(v_sqlStmt); 
     EXECUTE IMMEDIATE  v_SqlStmt; 
  END; 
  /

Improved Performance through Partition Exchange Loading

Data partitioning can improve performance when loading or purging data in a target system. This practice is known as Partition Exchange Loading (PEL).

PEL is recommended when loading a relatively small amount of data into a target containing a much larger volume of historical data. The target can be a table, a dimension, or a cube in a data warehouse.

This section includes the following topics:

About Partition Exchange Loading

By manipulating partitions in your target system, you can use Partition Exchange Loading (PEL) to instantly add or delete data. When a table is exchanged with an empty partition, new data is added.

You can use PEL to load new data by exchanging it into a target table as a partition. For example, a table that holds the new data assumes the identity of a partition from the target table and this partition assumes the identity of the source table. This exchange process is a DDL operation with no actual data movement.

Figure 22-13 illustrates an example of PEL. Data from a source table Source is inserted into a target table consisting of four partitions (Target_P1, Target_P2, Target_P3, and Target_P4). If the new data needs to be loaded into Target_P3, the partition exchange operation only exchanges the names on the data objects without moving the actual data. After the exchange, the formerly labeled Source is renamed to Target_P3, and the former Target_P3 is now labeled as Source. The target table still contains four partitions: Target_P1, Target_P2, Target_P3, and Target_P4. The partition exchange operation available in Oracle 9i completes the loading process without data movement.

Figure 22-13 Overview of Partition Exchange Loading

This illustration is described in the surrounding text.
Description of "Figure 22-13 Overview of Partition Exchange Loading"

Configuring a Mapping for PEL

To configure a mapping for partition exchange loading, complete the following steps:

  1. In the Project Explorer, right-click a mapping and select Configure.

    Warehouse Builder displays the Configuration Properties window.

  2. By default, PEL is disabled for all mappings. Select PEL Enabled to use Partition Exchange Loading.

  3. Use Data Collection Frequency to specify the amount of new data to be collected for each run of the mapping. Set this parameter to specify if you want the data collected by Year, Quarter, Month, Day, Hour, or Minute. This determines the number of partitions.

  4. Select Direct if you want to create a temporary table to stage the collected data before performing the partition exchange. If you do not select this parameter, Warehouse Builder directly swaps the source table into the target table as a partition without creating a temporary table. For more information, see "Direct and Indirect PEL".

  5. If you select Replace Data, Warehouse Builder replaces the existing data in the target partition with the newly collected data. If you do not select it, Warehouse Builder preserves the existing data in the target partition. The new data is inserted into a non-empty partition. This parameter affects the local partition and can be used to remove or swap a partition out of a target table. At the table level, you can set Truncate/Insert properties.

Direct and Indirect PEL

When you use Warehouse Builder to load a target by exchanging partitions, you can load the target indirectly or directly.

  • Indirect PEL: By default, Warehouse Builder creates and maintains a temporary table that stages the source data before initiating the partition exchange process. For example, use Indirect PEL when the mapping includes a remote source or a join of multiple sources.

  • Direct PEL: You design the source for the mapping to match the target structure. For example, use Direct PEL in a mapping to instantaneously publish fact tables that you loaded in a previously executed mapping.

Using Indirect PEL

If you design a mapping using PEL and it includes remote sources or a join of multiple sources, Warehouse Builder must perform source processing and stage the data before partition exchange can proceed. Therefore, configure such mappings with Direct PEL set to False. Warehouse Builder transparently creates and maintains a temporary table that stores the results from source processing. After performing the PEL, Warehouse Builder drops the table.

Figure 22-14 shows a mapping that joins two sources and performs an aggregation. If all new data loaded into the ORDER_SUMMARY table is always loaded into same partition, then you can use Indirect PEL on this mapping to improve load performance. In this case, Warehouse Builder transparently creates a temporary table after the Aggregator and before ORDER_SUMMARY.

Figure 22-14 Mapping with Multiple Sources

This illustration is described in the surrounding text.
Description of "Figure 22-14 Mapping with Multiple Sources"

Warehouse Builder creates the temporary table using the same structure as the target table with the same columns, indexes, and constraints. For the fastest performance, Warehouse Builder loads the temporary table using parallel direct-path loading INSERT. After the INSERT, Warehouse Builder indexes and constrains the temporary table in parallel.

Example: Using Direct PEL to Publish Fact Tables

Use Direct PEL when the source table is local and the data is of good quality. You must design the mapping such that the source and target are in the same database and have exactly the same structure. The source and target must have the same indexes and constraints, the same number of columns, and the same column types and lengths.

For example, assume that you have the same mapping from Figure 22-14 but would like greater control on when data is loaded into the target. Depending on the amount of data, it could take hours to load and you would not know precisely when the target table would be updated.

To instantly load data to a target using Direct PEL:

  1. Design one mapping to join source data, if necessary, transform data, ensure data validity, and load it to a staging table. Do not configure this mapping to use PEL.

    Design the staging table to exactly match the structure of the final target that you will load in a separate mapping. For example, the staging table in Figure 22-14 is ORDER_SUMMARY and should be of the same structure as the final target, ORDER_CUBE in Figure 22-15.

  2. Create a second mapping that loads data from the staging table to the final target such as shown in Figure 22-15. Configure this mapping to use Direct PEL.

    Figure 22-15 Publish_Sales_Summary Mapping

    This illustration is described in the surrounding text.
    Description of "Figure 22-15 Publish_Sales_Summary Mapping"

  3. Use either the Warehouse Builder Process Flow Editor or Oracle Workflow to start the second mapping after the completion of the first.

Using PEL Effectively

You can use PEL effectively for scalable loading performance if the following conditions are true:

  • Table partitioning and tablespace: The target table must be Range partitioned by one DATE column. All partitions must be created in the same tablespace. All tables are created in the same tablespace.

  • Existing historical data: The target table must contain a huge amount of historical data. An example use for PEL is for a click stream application where the target collects data every day from an OLTP database or Web log files. New data is transformed and loaded into the target that already contains historical data.

  • New data: All new data must to be loaded into the same partition in a target table. For example, if the target table is partitioned by day, then the daily data should be loaded into one partition.

  • Loading Frequency: The loading frequency should be equal to or less than the data collection frequency.

  • No global indexes: There must be no global indexes on the target table.

Configuring Targets in a Mapping

To configure targets in a mapping for PEL:

Step 1: Create All Partitions

Warehouse Builder does not automatically create partitions during runtime. Before you can use PEL, you must create all partitions as described in "Using Partitions".

For example, if you select Month as the frequency of new data collection, you must create all the required partitions for each month of new data. Use the Data Object Editor to create partitions for a table, dimension, or cube.

To use PEL, all partition names must follow a naming convention. For example, for a partition that will hold data for May 2002, the partition name must be in the format Y2002_Q2_M05.

For PEL to recognize a partition, its name must fit one of the following formats:

Ydddd

Ydddd_Qd

Ydddd_Qd_Mdd

Ydddd_Qd_Mdd_Ddd

Ydddd_Qd_Mdd_Ddd_Hdd

Ydddd_Qd_Mdd_Ddd_Hdd_Mdd

Where d represents a decimal digit. All the letters must be in upper case. Lower case is not recognized.

If you correctly name each partition, Warehouse Builder automatically computes the Value Less Than property for each partition. Otherwise, you must manually configure Value Less Than for each partition for Warehouse Builder to generate a DDL statement. The following is an example of a DDL statement generated by Warehouse Builder:

. . .
PARTITION A_PARTITION_NAME 
      VALUES LESS THAN (TO_DATE('01-06-2002','DD-MM-YYYY')),
. . .

Step 2: Create All Indexes Using the LOCAL Option

Add an index (ORDER_SUMMARY_PK_IDX) to the ORDER_SUMMARY table. This index has two columns, ORDER_DATE and ITEM_ID. Set the following on the Indexes tab of the Data Object Editor:

  • Select UNIQUE in the Type column.

  • Select LOCAL in the Scope column.

Now Warehouse Builder can generate a DDL statement for a unique local index on table ORDER_SUMMARY.

Using local indexes provides the most important PEL performance benefit. Local indexes require all indexes to be partitioned in the same way as the table. When the temporary table is swapped into the target table using PEL, so are the identities of the index segments.

If an index is created as a local index, the Oracle server requires that the partition key column must be the leading column of the index. In the preceding example, the partition key is ORDER_DATE and it is the leading column in the index ORDER_SUMMARY_PK_IDX.

Step 3: Primary/Unique Keys Use "USING INDEX" Option

In this step you must specify that all primary key and unique key constraints are created with the USING INDEX option. In the Project Explorer, right-click the table and select Configure. The Configuration Properties dialog box is displayed. Select the primary or unique key in the left panel and select Using Index in the right panel.

With the USING INDEX option, a constraint will not trigger automatic index creation when it is added to the table. The server will search existing indexes for an index with same column list as that of the constraint. Thus, each primary or unique key constraint must be backed by a user-defined unique local index. The index required by the constraint ORDER_SUMMARY_PK is ORDER_SUMMARY_PK_IDX which was created in "Step 2: Create All Indexes Using the LOCAL Option".

Restrictions for Using PEL in Warehouse Builder

These are the restrictions for using PEL in Warehouse Builder:

  • Only One Date Partition Key: Only one partition key column of DATE data type is allowed. Numeric partition keys are not supported in Warehouse Builder.

  • Only Natural Calendar System: The current PEL method supports only the natural calendar system adopted worldwide. Specific business calendar systems with user-defined fiscal and quarter endings are currently not supported.

  • All Data Partitions Must Be In the Same Tablespace: All partitions of a target (table, dimension, or cube) must be created in the same tablespace.

  • All Index Partitions Must Be In the Same Tablespace: All indexes of a target (table, dimension, or cube) must be created in the same tablespace. However, the index tablespace can be different from the data tablespace.

High Performance Data Extraction from Remote Sources

Although you can design mappings to access remote sources through database links, performance is likely to be slow when you move large volumes of data. For mappings that move large volumes of data between sources and targets of the same Oracle Database version, you have an option for dramatically improving performance through the use of transportable modules.

See Also:

"Moving Large Volumes of Data" in the Oracle Warehouse Builder Installation and Administration Guide for instructions on using transportable modules

Configuring ETL Objects

Earlier in the design phase, you defined a logical model for your target system using Warehouse Builder design objects. This chapter includes reference information for assigning physical properties to mappings and process flows. This chapter presents configuration properties in the order they appear in the user interface.

This chapter includes:

Configuring Mappings Reference

When you configure mappings properly, you can improve the ETL performance. Use this section as a reference for setting configuration parameters that govern how data is loaded and to optimize code for better performance.

This section includes the following topics:

Procedure to Configure Mappings

To configure physical properties for a mapping:

  1. Right-click a mapping from the Project Explorer and select Configure.

    Warehouse Builder displays the Configuration Properties dialog box for a mapping. You can configure properties for the mapping itself and the operators within the mapping.

  2. Select Deployable to enable the generation of a set of scripts for mapping entities marked as deployable.

    Deployable is enabled by default. If you disable it, scripts for that mapping are not generated.

  3. Set Language to the type of code you want to generate for the selected mapping.

    The options from which you can choose depend upon the design and use of the operators in the mapping. Warehouse Builder sets the correct value depending on the mapping: PL/SQL, SQL*Loader, ABAP (for an SAP source mapping).

  4. If you want to schedule the mapping to run based on a previously defined schedule, click Referred Calendar.

    For instructions on creating and using schedules, see Chapter 26, "Scheduling ETL Objects".

  5. Expand Runtime Parameters to configure your mapping for deployment.

    For a description of each runtime parameter, see "Runtime Parameters".

  6. Expand Code Generation Options to enable performance options that optimize the code generated for the mapping.

    For a description of each option, see "Code Generation Options".

  7. Go to the node for each operator in the mapping to set their physical properties.

    For information about configuring sources and targets in a mapping, see "Sources and Targets Reference". To configure mappings with flat file sources and targets, see "Configuring Flat File Operators".

Runtime Parameters

When you configure Runtime Parameters for a mapping, you set the default behaviors for the mapping. You can override these parameters when you execute the mapping either in the Control Center, the Process Flow Editor, or Oracle Enterprise Manager.

The Runtime Parameters include the following parameters:

Bulk Size

Use Bulk Size to specify the number of rows in each bulk for PL/SQL Bulk Processing. Warehouse Builder uses the Bulk Size parameter only when Bulk Processing Code option is selected and the operating mode is set to row based. For more information, see the Oracle PL/SQL Reference Guide.

Analyze Table Sample Percentage

When you select the Analyze Table Statements option, Warehouse Builder estimates when gathering statistics on the target tables. After data is loaded into the target tables, statistics used for cost-based optimization are gathered on each target table. You can set this parameter to the percentage of rows in each target table used for this analysis.

Commit Frequency

Commit frequency applies only to non-bulk mode mappings. Bulk mode mappings commit according to the bulk size.

If you set the Default Operating Mode to row based and deselect Bulk Processing Code, then use the Commit Frequency parameter to determine the number of rows to be processed before a commit. Warehouse Builder commits data to the database after processing the number of rows specified in this parameter.

If you select the Bulk Processing Code option, set the Commit Frequency equal to the Bulk Size. If the two values differ, then Bulk Size overrides Commit Frequency and a commit is implicitly performed for every bulk size.

Maximum Number of Errors

Use Maximum Number of Errors to indicate the maximum number of errors allowed while executing the package. Execution of the package terminates when the number of errors exceeds the maximum number of errors value.

Default Operating Mode

For mappings with a PL/SQL implementation, select a default operating mode. The operating mode you select can greatly affect mapping performance. For details on how operating modes affect performance, see "Set Based Versus Row Based Operating Modes". You can select one of the following operating modes:

  • Set based: A single SQL statement that inserts all data and performs all operations on the data is generated. This increases the speed of Data Manipulation Language (DML) operations. Set based mode offers optimal performance but minimal auditing details.

  • Row based: Statements that process data row by row are generated. The select statement is a SQL cursor. All subsequent statements are PL/SQL. Because data is processed row by row, the row based operating mode has the slowest performance but offers exhaustive auditing details.

  • Row based (Target Only): A cursor select statement is generated and attempts are made to include as many operations as possible in the cursor. For each target, Warehouse Builder generates a PL/SQL insert statement and inserts each row into the target separately.

  • Set based fail over row based: The mapping is executed in set based mode. If an error occurs, the execution fails and the mapping is started over again in the row based mode. This mode is recommended for use only in test environments and is not recommended for use in production environments.

  • Set based fail over row based (Target Only): The mapping is first executed in set based mode. If an error occurs, the execution fails over to Row based (Target Only) mode. This mode is recommended for use only in test environments and is not recommended for use in production environments.

Default Audit Level

Use Default Audit Level to indicate the audit level used when executing the package. Audit levels dictate the amount of audit information captured in the runtime schema when the package is run. The audit level settings are:

  • None: No auditing information is recorded in runtime.

  • Statistics: Statistical auditing information is recorded in runtime.

  • Error Details: Error information and statistical auditing information is recorded in runtime.

  • Complete: All auditing information is recorded in runtime. Running a mapping with the audit level set to Complete generates a large amount of diagnostic data which may quickly fill the allocated tablespace.

Default Purge Group

Default Purge Group is used when executing the package. Each audit record in the runtime schema is assigned to the purge group specified.

Code Generation Options

The Code Generation Options include the following:

ANSI SQL Syntax

If you select this option, ANSI SQL syntax is generated. Else, Oracle SQL syntax is generated.

Commit Control

Automatic: This is the default setting. Warehouse Builder loads and then automatically commits data based on the mapping design. This setting is valid for all mapping types. For multiple targets in a single mapping, data is committed based on target by target processing (insert, update, delete).

Automatic Correlated: Automatic correlated commit is a specialized type of automatic commit that applies only to PL/SQL mappings with multiple targets. Warehouse Builder considers all targets collectively and commits or rolls back data uniformly across all targets.

The mapping behavior varies according to the operating mode you select. For more information about correlated commit, see "Committing Data from a Single Source to Multiple Targets".

Manual: Select manual commit control for PL/SQL mappings that you want to interject complex business logic or perform validations before committing data.

You have the following options for specifying manual commits:

No Commit: If you set this option, then OWB mapping does not issue a commit while the mapping executes.

Analyze Table Statements

If you select this option, code is generated for analyzing the target table after the target is loaded, if the resultant target table is double or half its original size.

If the target table is not in the same schema as the mapping and you wish to analyze the table, then you would need to grant ANALYZE ANY to the schema owning the mapping.

Enable Parallel DML

If you select this option, Parallel DML is enabled at runtime. Executing DML statements in parallel improves the response time of data-intensive operations in large databases that are present in a data warehouse.

Optimized Code

Select this option to improve performance for mappings that include the Splitter operator and inserts into multiple target tables. When this option is selected and the mapping is executed by Oracle9i or higher, a single SQL statement is generated (multi_table_insert) that inserts data into multiple tables based on the same set of source data.

Note that the multiple table insert is performed only if this option is selected and the Oracle target module database is Oracle9i or higher. The multiple tables insert is performed only for mappings in set based mode that include a Splitter operator, and does not include active operators, such as an Aggregator or Joiner operator, between the Splitter and the target. In addition, the multiple insert is available only for tables. It is not available for views, materialized views, dimensions, or cubes. Each target table must have fewer than 999 columns. For detailed instructions on how to create a mapping with multiple targets, see "Example: Creating Mappings with Multiple Targets".

Do not select this option for mappings run in row based mode or for mappings executed by Oracle8i server. Also, do not select this option when auditing is required.

When this option is selected, one total SELECT and INSERT count is returned for all targets.

Authid

Specifies the AUTHID option to be used while generating the code. The options you can select are Current_User, Definer, or None.

Use Target Load Ordering

For PL/SQL mappings with multiple targets, you can generate code that defines an order for loading the targets. This is important when a parent-child relationship exists between two or more targets in a mapping. The option is selected by default.

ERROR TRIGGER

Specify the name of the error trigger procedure in this field.

Bulk Processing Code

If this configuration parameter is selected and the operating mode is set to row based, Warehouse Builder generates PL/SQL bulk processing code. PL/SQL bulk processing improves row-based ETL performance by collecting, processing, and writing rows in bulk, instead of doing it row by row. The size of each bulk is determined by the configuration parameter Bulk Size. Set based mode offers optimal performance, followed by bulk processing, and finally by row based mode. For more information, see the Oracle PL/SQL Reference Guide.

Generation Mode

By default, when code is generated for a mapping, the code for all possible operating modes is generated. That is, if you set Default Operating Mode to set based, Warehouse Builder still generates code for all possible operating modes when Generation Mode is set to All Operating Modes. This enables you to switch the operating modes for testing purposes at runtime.

Sources and Targets Reference

For relational and dimensional sources and targets such as tables, views, and cubes, Warehouse Builder displays the following set of properties for each operator:

Use LCR APIs

By default, this setting is enabled and DML is performed using LCR APIs if available. If no LCR APIs are available, then the standard DML is used.

Database Link

This parameter is maintained for backward compatibility only.

In previous releases, you could select a database link by name from the list. Source operators can be configured for schemas and database links, but targets can be configured for schemas only. Sources and targets can reside in different schemas, but they must reside in the same database instance.

Location

This setting specifies the location that is used to access the source or target operator.

Conflict Resolution

Enable this setting to detect and resolve any conflicts that may arise during DML using the LCR APIs.

Schema

This parameter is maintained for backward compatibility only.

In previous releases, you could link the mapping to a particular schema by clicking on the Schema field and typing a name.

Partition Exchange Loading

Use setting in this section to enable Partition Exchange Loading (PEL) into a target table. For specific information about each of these settings and additional information about how to design mappings for PEL, see "Improved Performance through Partition Exchange Loading".

Hints

Define loading or extraction hints. Application developers often develop insights into their data. For example, they know that a query runs much faster if a set of tables is joined in one order rather than another. Warehouse Builder can incorporate these insights into the generated SQL code packages as SQL Optimizer Hints.

When you select a hint from the Hints dialog box, the hint appears in the Existing Hints field. Type additional text as appropriate in the Extra Text column. The editor includes the hint in the mapping definition as is. There is no validation or checking on this text.

You can define loading hints for mappings that load data in INSERT or UPDATE mode. By default, commonly used hints such as APPEND and PARALLEL are added. For all loading modes other than INSERT, the APPEND hint causes no effect and you can choose to remove it.

Hint is available during mapping configuration. To configure the hint:

  1. In Project Explorer, expand the Database folder.

  2. In Database, expand the Repository module.

  3. In Repository module, expand Mappings.

  4. In Mappings, select the required mapping.

  5. Right-click Mapping and select Configure.

  6. In Configuration Properties window, expand the required operator.

  7. Select the operator function.

  8. To open the Inline View Hint window, click the Ellipsis button next to the Inline View Hint option in the Configuration Properties window.

For information about optimizer hints and how to use them, see Oracle 9i Designing and Tuning for Performance.

Constraint Management

Configure the following Constraint Management parameters:

  • Exceptions Table Name: All rows that violated their foreign key constraints during re-enabling are logged into the specified exceptions table. No automatic truncation of this table is performed either before or after the load. Constraint violations are also loaded into the runtime audit error tables.

    For SQL and PL/SQL loading, if you do not specify an exceptions table, invalid rows load into a temporary table located in the default tablespace and then load into the Runtime Audit error table. The table is dropped at the end of the load.

    If you are using SQL*Loader direct path loading, you must specify an exception table. Consult the SQL*Loader documentation for more information.

  • Enable Constraints: If you select this option, the foreign key constraints on the target table are maintained before loading data. If you do not select this option, the foreign key constraints on target tables are disabled before loading the data and then re-enabled after loading. Constraint violations found during re-enable are identified in the runtime audit error table and, if specified, loaded into an exceptions table.

    When you disable constraints, loading is quicker because a constraint check is not performed. However, if exceptions occur for any rows during re-enabling, the constraints for those rows will remain in a non-validated state. These rows are logged in the runtime audit error table by their ROWID. You must manually inspect the error rows to take any necessary corrective action.

    The disabling and enabling of constraints happen on the target table. When Enable Constraints property is disabled, the constraints on the target table will be disabled prior to the loading of data, and will be re-enabled after loading of data. When the constraints are re-enabled, the entire table is scanned and rows that violate the constraints are logged in the exceptions table and these are reported as constraint violation errors in the audit browser.

    Consider a scenario where the target table is empty and the Enable Constraints property is disabled. Initially suppose the source table has 10 rows, of which 2 rows violate the constraint on the target table. When the mapping is executed, the constraints on the target table are first disabled, then data is loaded (all 10 rows), and then constraints on the target table are re-enabled. When the constraints are re-enabled, the 2 rows that violate the constraints are logged into the exceptions table. The audit browser reports that there are 2 constraint violation errors.

    Later, the mapping is again executed with a new source table containing 20 rows, of which 5 rows violate the constraint on the target table. After the data is loaded into the target table (all 20 rows), the target table has 30 rows. When the constraints on the target table are re-enabled, 7 rows will be logged into the exceptions table and reported as constraint violation errors in the audit browser. These include the 5 rows reported newly as well as the 2 rows reported initially. This is because the Warehouse Builder scans the entire target table, which means that all 30 rows will be checked and therefore the 2 rows with violations from the first data load will still be included. Warehouse Builder cannot identify only the new rows added when the mapping was executed the second time. Therefore, unless you truncate the target table before each data load, you will always see the constraint violations from the previous data loads reported each time.

    Deselecting the Enable Constraints option is subject to the following restrictions:

    • For set based operating mode, the deselect setting disables foreign key constraints on the targets before loading, and then re-enables the constraints after loading. This property has no effect on foreign key constraints on other tables referencing the target table. If the load is done using SQL*Loader instead of a SQL or PL/SQL package, then a re-enable clause is added to the .ctl file.

    • For set based fail over to row based and set based fail over to row based (target only) operating modes, the deselect setting disables the foreign key constraints on the targets before loading and then re-enables them if the load succeeds in set based mode. This setting has no effect on foreign keys referencing other tables. If the load fails over to row-based, then loading will repeat in row based mode and all constraints remain enabled.

      Note:

      Constraint violations created during re-enabling will not cause the load to fail from set based over to row based mode.
    • For row based or row based (target only) operating modes, all foreign key constraints remain enabled even if the option is not selected.

    • For the TRUNCATE/INSERT DML type, the deselect setting disables foreign key constraints on other tables referencing the target table before loading, and then re-enables the constraints after loading, regardless of the default operating mode.

SQL*Loader Parameters

When you have a table operator that contains inputs from a flat file, you need to configure the following SQL*Loader Parameters properties:

  • Partition Name: Indicates that the load is a partition-level load. Partition-level loading enables you to load one or more specified partitions or subpartitions in a table. Full database, user, and transportable tablespace mode loading does not support partition-level loading. Because incremental loading (incremental, cumulative, and complete) can be done only in full database mode, partition-level loading cannot be specified for incremental loads. In all modes, partitioned data is loaded in a format such that partitions or subpartitions can be selectively loaded.

  • Sorted Indexes Clause: Identifies the indexes on which the data is presorted. This clause is allowed only for direct path loads. Because the data sorted for one index is not usually in the right order for another index, you specify only one index in the SORTED INDEXES clause. When the data is in the same order for multiple indexes, all indexes can be specified at once. All indexes listed in the SORTED INDEXES clause must be created before you start the direct path load.

  • Singlerow: Intended for use during a direct path load with APPEND on systems with limited memory, or when loading a small number of records into a large table. This option inserts each index entry directly into the index, one record at a time. By default, SQL*Loader does not use SINGLEROW to append records to a table. Index entries are stored in a temporary area and merged with the original index at the end of the load. Although this method achieves better performance and produces an optimal index, it requires extra storage space. During the merge, the original index, the new index, and the space for new entries all simultaneously occupy storage space. With the SINGLEROW option, storage space is not required for new index entries or for a new index. Although the resulting index may not be as optimal as a freshly sorted one, it takes less space to produce. It also takes more time because additional UNDO information is generated for each index insert. This option is recommended when the available storage is limited. It is also recommended when the number of records to be loaded is small compared to the size of the table. A ratio of 1:20 or less is considered small.

  • Trailing Nullcols: Sets SQL*Loader to treat any relatively positioned columns that are not present in the record as null columns.

  • Records To Skip: Invokes the SKIP command in SQL*Loader. SKIP specifies the number of logical records from the beginning of the file that should not be loaded. By default, no records are skipped. This parameter continues loads that have been interrupted for some reason. It is used for all conventional loads, for single-table direct loads, and for multiple-table direct loads when the same number of records are loaded into each table. It is not used for multiple-table direct loads when a different number of records are loaded into each table.

  • Database File Name: Specifies the names of the export files to import. The default extension is .dmp. Because you can export multiple export files, you may need to specify multiple file names to be imported. You must have read access to the imported files. You must also have the IMP_FULL_DATABASE role.

Configuring Flat File Operators

The Configuration Properties dialog box contains additional settings for Mapping Flat File operators, depending on how the operators are used in the mapping.

Flat File Operators as a Target

To configure properties unique to mappings with flat file targets:

  1. Select a mapping from the Project Explorer, select Design from the menu bar, and select Configure.

    Or, right-click the mapping you want to configure and select Configure.

    Warehouse Builder displays the Configuration Properties dialog box.

  2. Choose the parameters you want to configure and click the space to the right of the parameter name to edit its value.

    For each parameter, you can either select an option from a list, type a value, or click the Ellipsis button to display another properties dialog box.

  3. Select the Deployable option to generate a set of scripts for mapping objects marked as deployable. If this option is not selected for a mapping, scripts are not generated for that mapping.

  4. Set Language to the type of code you want to generate for the selected mapping. The options you can choose from depend upon the design and use of the operators in the mapping. Depending on the mapping, you can select from PL/SQL, ABAP (for an SAP source mapping), or SQL*Loader.

  5. Specify the location to deploy the mapping.

  6. Under Runtime Parameters, set the Default Operating Mode to Row based (target only). This type of mapping will not generate code in any other default operating mode. For a description of each runtime parameter, see "Runtime Parameters".

  7. Set the Code Generation Options as described in "Code Generation Options".

  8. Set the Sources and Targets Reference as described in "Sources and Targets Reference".

  9. For Access Specification, specify the name of the flat file target in Target Data File Name. For the Target Data File Location, specify a target file located on the computer where you installed the Runtime Platform. Select Output as XML file if you want the output to be in an xml file.

Flat File Operator as a Source

To configure a mapping with a flat file operator as a source:

  1. Select a mapping from the Project Explorer, select Design from the menu bar, and select Configure. Or, right-click the mapping you want to configure and select Configure.

  2. Select the parameters you want to configure and click the space to the right of the parameter name to edit its value.

    For each parameter, you can specify whether you want the parameter to be selected, select an option from a list, type a value, or click the Ellipsis button to display another properties dialog box.

  3. Select the Deployable option to generate SQL*Loader script.

  4. Specify the Log File Location and Log File Name.

  5. Select Continue Load.

    If SQL*Loader runs out of space for data rows or index entries, the load is discontinued. If Continue Load option is selected, an attempt is made to continue discontinued loads.

  6. In Nls Characterset, specify the character set to place in the CHARACTERSET clause.

  7. Select Direct Mode to indicate that a direct path load will be done. If this option is not set, a conventional load will be done. In general, direct mode is faster.

  8. Select Operation Recoverable to indicate that the load is recoverable. If this option is not selected, the load is not recoverable and records are not recorded in the redo log.

  9. Configure the following parameters that affect the OPTIONS clause in the SQL *Loader scripts that is generated for mappings with flat file sources.

    Perform Parallel Load: If this option is selected, direct loads can operate in multiple concurrent sessions.

    Errors Allowed: If the value specified is greater than 0, then the ERRORS = n option is generated. SQL*Loader terminates the load at the first consistent point after it reaches this error limit.

    Records To Skip: If the value specified is greater than 0, then the SKIP = n option is generated. This value indicates the number of records from the beginning of the file that should not be loaded. If the value is not specified, no records are skipped.

    Records To Load: If the value specified is greater than 0, then the LOAD = n option is generated. This value specifies the maximum number of records to load. If a value is not specified, then all of the records are loaded.

    Rows Per Commit: If the value specified is greater than 0, then the ROWS = n option is generated. For direct path loads, the value identifies the number of rows to read from the source before a data is saved. For conventional path loads, the value specifies the number of rows in the bind array.

    Read Size: If the value specified is greater than 0, then the READSIZE = n option is generated. The value is used to specify the size of the read buffer.

    Bind Size: If the value specified is greater than 0, then the BINDSIZE = n option is generated. The value indicates the maximum size, in bytes, of the bind array.

    Read Buffers: If the value specified is greater than 0, then the READBUFFERS = n clause is generated. READBUFFERS specifies the number of buffers to use during a direct path load. Do not specify a value for READBUFFERS unless it is necessary.

    Preserve Blanks: If this option is selected, then the PRESERVE BLANKS clause is generated. PRESERVE BLANKS retains leading white space when optional enclosure delimiters are not present. It also leaves the trailing white space intact when fields are specified with a predetermined size.

    Database File Name: This parameter enables you to specify the characteristics of the physical files to be loaded. The initial values for these parameters are set from the properties of the flat file used in the mapping.

    If this parameter is set to a non-blank value, then the FILE= option is generated. The value specified is enclosed in single quotes in the generated code.

    Control File Location and Control File Name: The control file name necessary for audit details.

    For more information about each SQL*Loader option and clause, see Oracle Database Utilities.

  10. Expand the Runtime Parameters to configure your mapping for deployment.

    Audit: Select this option to perform an audit when the package is executed.

    Default Purge Group: The Default Purge Group is used when executing the package. Each audit record in the runtime schema is assigned to the purge group specified.

  11. Expand Sources and Targets Reference to set the physical properties of the operators in the mapping as described in "Sources and Targets Reference".

Configuring Process Flows Reference

To configure a process flow module:

  1. Right-click the process flow module and select Configure.

    Warehouse Builder displays the Configuration Properties dialog box for the process flow module.

  2. Set the properties for Evaluation Location and Identification Location.

    Evaluation Location is the location from which this process flow is evaluated.

    Identification Location provides the location where the generated code will be deployed to.

To configure a process flow package:

  1. Right-click the process flow package and select Configure.

    Warehouse Builder displays the Configuration Properties dialog box for the process flow module.

  2. Set the properties for Referred Calendar and Generation Comments.

    Referred Calendar provides the schedule to associate with this package.

    Generation Comments provides additional comments for the generated code.

Click any of the activities of a package to view its properties.

Under Path Settings, set the following properties for each activity in the process flow:

Execution Location: The location from which this activity is executed. If you configured Oracle Enterprise Manager, you can select an OEM agent to execute the process flow.

Remote Location: The remote location for FTP activities only.

Working Location: The working location for FTP, FILE EXISTS, and External Process activities only.

Deployed Location: The deployment location. This setting applies to transformation activities only. For activities referring to pre-defined transformations, you must change the setting from Use Default Location and specify a valid location.

Under General Properties, you can view the bound name, which is the name of the object that the activity represents in the process flow. Only mapping, transformation, and subprocess activities have bound names.

Under Execution Settings, select the option Use Return as Status.

This setting governs the behavior for activities that return NUMBER in their output. These activities include the FTP, User Defined, and Transform activities. When you select Use Return as Status, the Process Flow Editor assigns the outgoing transition conditions based on the following numerical return values for the activity:

1 = Success Transition

2 = Warning Transition

3 = Error Transition