Skip Headers
Oracle® Warehouse Builder User's Guide
10g Release 1 (10.1)

Part Number B12146-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

11 Configuring ETL Objects

Earlier in the design phase, you defined a logical model for your target system using Warehouse Builder design objects. This chapter includes reference information and recommendations for assigning physical properties to mappings and process flows.

This chapter includes:

Configuring Mappings Reference

When you correctly configure mappings, you can improve the ETL performance. Use this section as a reference for setting configuration parameters that govern how Warehouse Builder loads data and optimizes code for better performance.

This section includes the following topics:

Procedure for Configuring Mappings

To configure physical properties for a mapping:

  1. Select a mapping from the navigation tree, select Object from the menu bar, and select Configure.

    Or, right-click the mapping you want to configure and select Configure from the pop-up menu.

    Warehouse Builder displays the configuration Properties dialog for a mapping as shown in Figure 11-1.

    Figure 11-1 Configuration Properties dialog For Mapping Tables

    Surrounding text describes Figure 11-1 .
  2. Set Deployable to true to enable Warehouse Builder to generate a set of scripts for mapping entities marked as deployable.

    The default setting is true. If you set deployable to false for a mapping, Warehouse Builder does not generate scripts for that mapping.

  3. Set Step Type to the type of code you want to generate for the selected mapping.

    The options you can choose from depend upon the design and use of the operators in the mapping. Warehouse Builder sets the correct value depending on the mapping: PL/SQL, SQL*Loader, ABAP (for an SAP source mapping).

  4. Expand the Runtime Parameters Reference to configure your mapping for deployment.

    For a description of each runtime parameter, see "Runtime Parameters Reference".

  5. Expand Code Generation Options Reference to enable performance options that optimize the code generated for the mapping.

    For a description of each option, see "Code Generation Options Reference".

  6. Expand Sources and Targets Reference to set the physical properties of the operators in the mapping.

    The Configuration Properties dialog displays all source and target operators in a mapping under Sources and Targets. Each operator contains a set of properties you can edit. For information on configuring sources and targets in a mapping, see "Sources and Targets Reference".

  7. If the mapping contains a flat file as a source or target, configure additional parameters. For information on configuring mappings with flat file sources and targets, see "Configuring Flat File Operators".

Runtime Parameters Reference

When you configure Runtime Parameters for a mapping, you set the default behaviors for the mapping. You can override these parameters when you execute the mapping either in the Deployment Manager, the Process Flow Editor, or Oracle Enterprise Manager.

The Runtime Parameters include the following parameters:

Default Operating Mode

For mappings with a PL/SQL implementation, select a default operating mode. The operating mode you select can greatly affect mapping performance. For details on how operating modes affect performance, see "Selecting a Default Operating Mode". You can select one of the following operating modes:

  • Set based: Warehouse Builder generates a single SQL statement that inserts all data and performs all operations on the data. This increases the speed of Data Manipulation Language (DML) operations. Set based mode offers optimal performance but minimal auditing details.

  • Row based: Warehouse Builder generates statements that process data row by row. The select statement is a SQL cursor. All subsequent statements are PL/SQL. Since data is processed row by row, the row based operating mode has the slowest performance but offers the most auditing details.

  • Row based (Target Only): Warehouse Builder generates a cursor select statement and attempts to include as many operations as possible in the cursor. For each target, Warehouse Builder generates a PL/SQL insert statement and inserts each row into the target separately.

  • Set based fail over row based: Warehouse Builder executes the mapping in set based mode. If an error occurs, the execution fails and Warehouse Builder starts the mapping over again in the row based mode.

  • Set based fail over row based (Target Only): The mapping is first executed in set based mode. If any error occurs, the execution fails over to Row based (Target Only) mode.

Bulk Size

Use Bulk Size to specify the number of rows in each bulk for PL/SQL Bulk Processing. Warehouse Builder uses the Bulk Size parameter only when Generate Bulk Processing Mode is set to true. For more information, see the Oracle PL/SQL Reference Guide.

Default Audit Level

Use Default Audit Level to indicate the audit level used when executing the package. Audit levels dictate the amount of audit information captured in the runtime schema when the package is run. The audit level settings are:

  • None: No auditing information is recorded in runtime.

  • Statistics: Statistical auditing information is recorded in runtime.

  • Error Details: Error information and statistical auditing information is recorded in runtime.

  • Complete: All auditing information is recorded in runtime. Running a mapping with the audit level set to Complete generates a large amount of diagnostic data which may quickly fill the allocated tablespace.

Maximum Number of Errors

Use Maximum Number of Errors to indicate the maximum number of errors allowed when executing the package. Execution of the package terminates when the number of errors reached is greater than the maximum number of errors value.

Commit Frequency

Commit frequency only applies to non-bulk mode mappings. Bulk mode mappings commit according to the bulk size.

When you set the Default Operating Mode to row based and Bulk Processing Code to false, Warehouse Builder uses the Commit Frequency parameter when executing the package. Warehouse Builder commits data to the database after processing the number of rows specified in this parameter.

If you set Bulk Processing Code to true, set the Commit Frequency equal to the Bulk Size. If the two values are different, Bulk Size overrides the commit frequency and Warehouse Builder implicitly performs a commit for every bulk size.

Default Purge Group

The Default Purge Group is used when executing the package. Each audit record in the runtime schema is assigned to the purge group specified.

Analyze Table Sample Percentage

When you set Analyze Table Statements to true, Warehouse Builder estimates when gathering statistics on the target tables. After data is loaded into the target tables, statistics used for cost-based optimization are gathered on each target table. You can set this parameter to the percentage of rows in each target table used for this analysis.

Code Generation Options Reference

The Code Generation Options include the following:

Bulk Processing Code

If this configuration parameter is set to true, Warehouse Builder generates PL/SQL bulk processing code. PL/SQL bulk processing improves row-based ETL performance by collecting, processing, and writing rows in bulk, instead of doing it row by row. The size of each bulk is determined by the configuration parameter Bulk Size. Set based mode offers optimal performance, followed by bulk processing, and finally by row based mode. For more information, see the Oracle PL/SQL Reference Guide.

Row Based Mode

Row based mode is an operating mode option. Row based mode allows for the maximum amount of runtime auditing. A mapping that runs in row-based mode processes data from a source record by record.

Parallel Row Code

Set this parameter to true to generate code using the table function feature supported in Oracle Database. This setting improves performance by enabling table functions to use parallel execution in the Oracle Server for Row based and Row based (Target only) operating modes.

You can set this parameter to true when the mapping generates PL/SQL code and is executed by Oracle Database server.

If the mapping generates SQL* Loader code, set the parameter to false (default). This property is only available for mappings that will generate PL/SQL code. It will not be available as a configuration property for mappings that will generate SQL* Loader code. This property is not available for set-based mode, but for set-based-fail-over-rowbased and set-based-fail-over-rowbased-target modes, this property is available if the set based mode fails and the code executes in row-based/row-based-target mode. This property is not available for mappings that include a Mapping Output Parameter, a Flat File operator, or a Sequence operator as a source.

Scenarios when parallelism is lost for table functions:

  1. When the input cursor refers to objects (such as tables or views) in remote schemas. For example, a mapping containing a table operator bound to a table in a remote database schema cannot be executed in parallel.

  2. When the input cursor passed to table function refers to objects in non-Oracle databases. Oracle database server parallel engine is unable to extract parallelism when the objects are accessed using an Oracle gateway. For example, when a mapping contains a table operator that is bounded to a table in a DB2 database, the mapping will not be executed in parallel.

  3. Parallel DML operation cannot be performed on tables with triggers. For example, a mapping with a table and a pre-map or post-map triggers that operate on the table.

  4. Updates and deletes can only be parallelized on partitioned tables. Update and delete parallelism is not possible within a partition or on a non-partitioned table. For example, if a mapping contains a table operator with a delete/update load, the target table must be partitioned.

Analyze Table Statements

If you set this parameter to true, Warehouse Builder generates code for analyzing the target table after the target is loaded if the resulting target table only is double or half its original size.

Optimized Code

Set this parameter to true to improve performance for mappings that include the Splitter operator and inserts into multiple target tables. When this parameter is set to true and the mapping is executed by Oracle Database, Warehouse Builder generates a single SQL statement (multi_table_insert) that inserts data into multiple tables based on same set of source data.

Warehouse Builder performs the multiple table insert only if this parameter is set to true and the Oracle target module database is Oracle Database. The multiple tables insert is performed only for mappings in set based mode that include a Splitter operator and no active operators such as an Aggregator or Joiner operator between the Splitter and the target. Also, the multiple insert is available only for tables, not views, materialized views, dimensions, or cubes. Each target table must have fewer than 999 columns. For detailed instructions on how to create a mapping with multiple targets, see "Example: Following a Record Through the Name and Address Operator".

Set this parameter to false for mappings run in row based mode or for mappings executed by Oracle8i server. You may also want to set this parameter to false when auditing is required. When set to true, Warehouse Builder returns one total SELECT and INSERT count for all targets.

Correlated Commit

This applies to mappings with multiple targets. When set to true, Warehouse Builder commits data based on the selected records. When set to false, Warehouse Builder commits data based on target by target processing (insert, update, delete).

The mapping behavior varies according to the operating mode you select. For more information about correlated commit, see "Committing Data from a Single Source to Multiple Targets".

Sources and Targets Reference

The Configuration Properties dialog displays all source and target operators in a mapping under the Sources and Targets. The properties that are available for configuration vary according to the types of source and target operators used in the mapping.

For relational and dimensional sources and targets such as tables, views, and cubes, Warehouse Builder displays the following set of properties for each operator:

Bound Name

When Warehouse Builder generates code, it identifies each source and target operator by its bound name. By default, the bound name is the same name as the operator name.

Schema

This parameter maintained for backward compatibility only.

In previous version of Warehouse Builder, you could link the mapping to a particular schema by clicking on the Schema field and typing a name.

Database Link

This parameter maintained for backward compatibility only.

In previous version of Warehouse Builder, you could select a database link by name from the drop-down list. Source operators can be configured for schemas and database links, but targets can be configured for schemas only. Sources and targets can reside in different schemas, but they must reside in the same database instance.

Partition Exchange Loading

Set this parameter to true to enable partition exchange loading (PEL) into a target table. For information on how to design mappings for PEL, see "Using Partition Exchange Loading".

Hints

Define extraction or loading hints. Application developers often develop insights into their data. For example, they know that a query runs much faster if a set of tables is joined in one order rather than another. Warehouse Builder can incorporate these insights into the generated SQL code packages as SQL Optimizer Hints.

When you select a hint from the Hints dialog, the hint appears in the Existing Hints field. Type additional text as appropriate in the Extra Text column. The editor includes the hint in the mapping definition as is. There is no validation or checking on this text.

For information on optimizer hints and how to use them, see Oracle Database Designing and Tuning for Performance.

Constraint Management

Configure the following Constraint Management parameters:

  • Enable Constraints: If this property is set to true, Warehouse Builder maintains the foreign key constraints on the target table before loading data. If this property is set to false, Warehouse Builder disables the foreign key constraints on target tables before data loading and then re-enables the constraints after loading. Constraint violations found during re-enable are identified in the runtime audit error table and, if specified, in an exceptions table.

    When you disable constraints, loading time is decreased since constraint checking is not performed. However, if exceptions occur for any rows during re-enabling, the constraints for those rows will remain in a non-validated state. These rows are logged in the runtime audit error table by their ROWID. You must manually inspect the error rows to take any necessary corrective action.

    Setting the Enable Constraints to false is subject to the following restrictions:

    • For set based operating mode, the false setting disables foreign key constraints on the targets before loading, and then re-enables the constraints after loading. This property has no effect on foreign key constraints on other tables referencing the target table. If the load is done using SQL*Loader instead of a SQL or PL/SQL package, then a re-enable clause is added to the .ctl file.

    • For set based fail over to row based and set based fail over to row based (target only) operating modes, the false setting disables the foreign key constraints on the targets before loading and then re-enables them if the load succeeds in set based mode. This setting has no effect on foreign keys referencing other tables. If the load fails over to row-based, then loading will repeat in row based mode and all constraints remain enabled.

      Note:

      Constraint violations created during re-enabling will not cause the load to fail from set based over to row based mode.
    • For row based or row based (target only) operating modes, all foreign key constraints remain enabled even if the property is set to false.

    • For the TRUNCATE/INSERT DML type, the false setting disables foreign key constraints on other tables referencing the target table before loading, and then re-enables the constraints after loading, regardless of the default operating mode.

  • Exceptions Table Name: All rows that violated their foreign key constraints during re-enabling are logged into the specified exceptions table. No automatic truncation of this table is performed either before or after the load. Constraint violations are also loaded into the runtime audit error tables.

    For SQL and PL/SQL loading, if you do not specify an exceptions table, invalid rows load into a temporary table located in the default tablespace and then load into the Runtime Audit error table. The table is dropped at the end of the load.

    If you are using SQL*Loader direct path loading, you must specify an exception table. Consult the SQL*Loader documentation for more information.

SQL*Loader Parameters

Configure the following SQL*Parameters properties:

  • Partition Name: Indicates that the load is a partition-level load. Partition-level loading enables you to load one or more specified partitions or subpartitions in a table. Full database, user, and transportable tablespace mode loading does not support partition-level loading. Because incremental loading (incremental, cumulative, and complete) can be done only in full database mode, partition-level loading cannot be specified for incremental loads. In all modes, partitioned data is loaded in a format such that partitions or subpartitions can be selectively loaded.

  • Sorted Indexes Clause: Identifies the indexes on which the data is presorted. This clause is allowed only for direct path loads. Because data sorted for one index is not usually in the right order for another index, you specify only one index in the SORTED INDEXES clause. When the data is in the same order for multiple indexes, all indexes can be specified at once. All indexes listed in the SORTED INDEXES clause must be created before you start the direct path load.

  • Singlerow: Intended for use during a direct path load with APPEND on systems with limited memory, or when loading a small number of records into a large table. This option inserts each index entry directly into the index, one record at a time. By default, SQL*Loader does not use SINGLEROW to append records to a table. Index entries are stored in a temporary area and merged with the original index at the end of the load. Although this method achieves better performance and produces an optimal index, it requires extra storage space. During the merge, the original index, the new index, and the space for new entries all simultaneously occupy storage space. With the SINGLEROW option, storage space is not required for new index entries or for a new index. Although the resulting index may not be as optimal as a freshly sorted one, it takes less space to produce. It also takes more time because additional UNDO information is generated for each index insert. This option is recommended when the available storage is limited. It is also recommended when the number of records to be loaded is small compared to the size of the table. A ratio of 1:20 or less is considered small.

  • Trailing Nullcols: Sets SQL*Loader to treat any relatively positioned columns that are not present in the record as null columns.

  • Records To Skip: Invokes the SKIP command in SQL*Loader. SKIP specifies the number of logical records from the beginning of the file that should not be loaded. By default, no records are skipped. This parameter continues loads that have been interrupted for some reason. It is used for all conventional loads, for single-table direct loads, and for multiple-table direct loads when the same number of records are loaded into each table. It is not used for multiple-table direct loads when a different number of records are loaded into each table.

  • Database File Name: Specifies the names of the export files to import. The default extension is .dmp. Because you can export multiple export files, you may need to specify multiple filenames to be imported. You must have read access to the imported files. You must also have the IMP_FULL_DATABASE role.

Configuring Flat File Operators

The Configuration Properties dialog contains additional settings for Mapping Flat File operators, depending on how the operators are used in the mapping.

Flat File Operators as a Target

To configure properties unique to mappings with flat file targets:

  1. Select a mapping from the navigation tree, select Object from the menu bar, and select Configure.

    Or, right-click the mapping you want to configure and select Configure from the pop-up menu.

    Warehouse Builder displays the Configuration Properties dialog as shown in Figure 11-2.

    Figure 11-2 Configuration Properties dialog (Flat File Target)

    Surrounding text describes Figure 11-2 .
  2. Choose the parameters you want to configure and click the space to the right of the parameter name to edit its value.

    For each parameter, you can either select an option from a list, type a value, or click ... to display another properties dialog.

  3. Set Deployable to true to enable Warehouse Builder to generate a set of scripts for mapping objects marked as deployable. The default setting is true. If you set deployable to false for a mapping, Warehouse Builder does not generate scripts for that mapping.

  4. Set Step Type to the type of code you want to generate for the selected mapping. The options you can choose from depend upon the design and use of the operators in the mapping. Depending on the mapping, you can select from PL/SQL, ABAP (for an SAP source mapping), or SQL*Loader.

  5. Specify the location to deploy the mapping.

  6. Under Runtime Parameters Reference, set the Default Operating Mode to Row based (target only). This type of mapping will not generate code in any other default operating mode. For a description of each runtime parameter, see "Runtime Parameters Reference".

  7. Set the Code Generation Options Reference as described in "Code Generation Options Reference".

  8. Set the Sources and Targets Reference as described in "Sources and Targets Reference".

  9. For Access Specification, specify the name of the flat file target in Target Data File Name. For the Target Data File Location, specify a target file located on the machine where you installed the Warehouse Builder Runtime Platform.

Flat File Operator as a Source

To configure a mapping with a flat file operator as a source:

  1. Select a mapping from the navigation tree, select Object from the menu bar, and select Configure. Or, right-click the mapping you want to configure and select Configure from the pop-up menu.

    The Configuration Properties dialog appears as shown in Figure 11-3.

    Figure 11-3 Configuration Properties Dialog (Flat File Source)

    Surrounding text describes Figure 11-3 .
  2. Choose the parameters you want to configure and click the space to the right of the parameter name to edit its value.

    For each parameter, you can either select an option from a list, type a value, or click ... to display another properties dialog.

  3. Set Deployable to true to a generate SQL*Loader script.

  4. Specify the Execution Location, the location where the mapping is executed.

  5. For the Source Data File, enter the full physical filename of the data file to be loaded.

    Type a complete path name for the file in the syntax of the operating system on which the SQL*Loader script is deployed. The value you provide is placed in the INFILE clause. If you enter a value enclosed in single quote, the generated code will be LOAD. CONTINUE_LOAD is used when a direct load of multiple tables is discontinued and needs to be restarted. It is used in conjunction with the operator-level SKIP option.

  6. Specify the Log File Location and Log File Name.

  7. Set Continue Load.

    If SQL*Loader runs out of space for data rows or index entries, the load is discontinued. If Continue Load is set to true, Warehouse Builder attempts to continue discontinued loads.

  8. In Nls Characterset, specify the character set to place in the CHARACTERSET clause.

  9. In Direct Mode, specify the value of the DIRECT option as either TRUE or FALSE. True indicates that a direct path load will be done. False indicates that a conventional load will be done. In general, direct mode is faster.

  10. In Operation Recoverable, select true to indicate that the load is recoverable. Select False to indicate that the load is not recoverable and records are not recorded in the redo log. This parameter controls the output of the RECOVERABLE clause.

  11. Configure the following parameters that affect the OPTIONS clause in the SQL *Loader scripts Warehouse Builder generates for mappings with flat file sources.

    Perform Parallel Load: Specifies the value of the PARALLEL option as either =TRUE or =FALSE. True indicates that direct loads can operate in multiple concurrent sessions.

    Errors Allowed: If the value specified is greater than 0, then the ERRORS = n option is generated. SQL*Loader terminates the load at the first consistent point after it reaches this error limit.

    Records To Skip: If the value specified is greater than 0, then the SKIP = n option is generated. This value indicates the number of records from the beginning of the file that should not be loaded. If the value is not specified, no records are skipped.

    Records To Load: If the value specified is greater than 0, then the LOAD = n option will be generated. This value specifies the maximum number of records to load. If a value is not specified all of the records are loaded.

    Rows Per Commit: If the value specified is greater than 0, then the ROWS = n option is generated. For direct path loads, the value identifies the number of rows to read from the source before a data is saved. For conventional path loads, the value specifies the number of rows in the bind array.

    Read Size: If the value specified is greater than 0, then the READSIZE = n option is generated. The value is used to specify the size of the read buffer.

    Bind Size: If the value specified is greater than 0, then the BINDSIZE = n option is generated. The value indicates the maximum size in bytes of the bind array.

    Read Buffers: If the value specified is greater than 0, then the READBUFFERS n clause is generated. READBUFFERS specifies the number of buffers to use during a direct path load. Do not specify a value for READBUFFERS unless it becomes necessary.

    Preserve Blanks: If this parameter is set to TRUE, then the PRESERVE BLANKS clause is generated. PRESERVE BLANKS retains leading white space when optional enclosure delimiters are not present. It also leaves the trailing white space intact when fields are specified with a predetermined size.

    Database File Name: This parameter enables you to specify the characteristics of the physical files to be loaded. The initial values for these parameters are set from the properties of the flat file used in the mapping.

    If this parameter is set to a non-blank value, then the FILE= option is generated. The value specified is enclosed in single quotes in the generated code.

    Control File Location and Control File Name: The control file name necessary for audit details.

    For more information on each SQL*Loader option and clause, see Oracle Database Database Utilities.

  12. Expand the Runtime Parameters Reference to configure your mapping for deployment.

    Audit: Perform audit when the step is executed.

    Default Purge Group: The Default Purge Group is used when executing the package. Each audit record in the runtime schema is assigned to the purge group specified.

  13. Expand Sources and Targets Reference to set the physical properties of the operators in the mapping as described in "Sources and Targets Reference".

Strategies for Configuring PL/SQL Mappings

This section discusses various strategies for configuring mappings. Use this section to determine which configuration settings are appropriate for a given scenario.

This section includes the following topics:

Selecting a Default Operating Mode

For mappings with a PL/SQL implementation, select one of the following operating modes:

The default operating mode you select depends upon the performance you expect, the amount of auditing data you require, and how you design the mapping. Mappings have at least one and as many as three valid operating modes, excluding the options for failing over to row based modes. During code generation, Warehouse Builder generates code for the specified default operating mode as well as the unselected modes. Therefore, at runtime, you can select to run in the default operating mode or any one of the other valid operating modes.

The types of operators in the mapping may limit the operating modes you can select. As a general rule, mappings run in set based mode can include any of the operators except for Match-Merge, Name-Address, and Transformations used as procedures. Although you can include any of the operators in row based and row based (target only) modes, there are important restrictions on how you use SQL based operators such as Aggregators, Joins, and Key Lookups. To use SQL based operators in either of the row based modes, ensure that the operation associated with the operator can be included in the cursor.

These general rules are explained in the following sections. For more information on how to use operators, see "Avoiding Invalid Designs for PL/SQL Mappings".

Set based

In set based mode, Warehouse Builder generates a single SQL statement that processes all data and performs all operations. Although processing data as a set improves performance, the auditing information available is limited. Runtime Auditing is limited to reporting to the execution error only. In set based mode, you cannot view details on which rows contain errors.

Table 11-0 shows a simple mapping and the associated logic Warehouse Builder uses to generate code for the mapping when run in set based operating mode. TAB1, FLTR, and TAB2 are processed as a set using SQL.

Figure 11-4 Simple Mapping Run in Set Based Mode

Surrounding text describes Figure 11-4 .

To correctly design a mapping for the set based mode, avoid operators that require row by row processing such as Match-Merge and Name-Address operators. If you include an operator in the dataflow that cannot be performed in SQL, Warehouse Builder does not generate set based code and issues an error when you execute the package in set based mode.

For target operators in a mapping, set their loading types to either INSERT/UPDATE or UPDATE/INSERT. Warehouse Builder does not support UPDATE or DELETE loading in set based mode. For a complete listing of how Warehouse Builder handles operators in set based mappings, see Table 11-3.

Row based

In row based mode, Warehouse Builder generates statements that process data row by row. The select statement is in a SQL cursor. All subsequent statements are PL/SQL. You can access full runtime auditing information for all operators performed in PL/SQL and only limited information for operations performed in the cursor.

Table 11-0 shows a simple mapping and the associated logic Warehouse Builder uses to generate code for the mapping when run in row based operating mode. TAB1 is included in the cursor and processed as a set using SQL. FLTR and TAB2 are processed row by row using PL/SQL.

Figure 11-5 Simple Mapping Run in Row Based Mode

Surrounding text describes Figure 11-5 .

If the mapping includes any SQL based operators, which cannot be performed in PL/SQL, Warehouse Builder attempts to generate code with those operations in the cursor. To generate valid row based code, design your mapping such that , if you include any of the following SQL based operators, Warehouse Builder can include the operations in the cursor:

  • Aggregation

  • Deduplicator

  • Join

  • Key Lookup

  • Sequence

  • Set

  • Sorter

In order for the preceding operators to be included in the cursor, do not directly precede it by an operator that generates PL/SQL code. In other words, you cannot run the mapping in row-based mode if it contains a Transformation implemented as procedure, a Flat File used as a source, a Match-Merge, or Name-Address operator directly followed by any of the seven SQL based operators. For the design to be valid, include a staging table between the PL/SQL generating operator and the SQL based operator. For more information about how to design mappings, see "Avoiding Invalid Designs for PL/SQL Mappings".

Row based (Target Only)

In row based (Target Only) mode, Warehouse Builder generates a cursor select statement and attempts to include as many operations as possible in the cursor. For each target, Warehouse Builder inserts each row into the target separately. You can access full runtime auditing information for all operators performed in PL/SQL and only limited information for operations performed in the cursor. Use this mode when you expect fast set based operations to extract and transform the data but need extended auditing for the loading the data, which is where errors are likely to occur.

Table 11-0 shows a simple mapping and the associated logic Warehouse Builder uses to generate code for the mapping when run in row based (target only) operating mode. TAB1 and FLTR are included in the cursor and processed as a set using SQL. TAB2 is processed row by row.

Figure 11-6 Simple Mapping Run in Row Based (Target Only) Mode

Surrounding text describes Figure 11-6 .

Row based (target only) places the same restrictions on SQL based operators as the row based operating mode. Additionally, for mappings with multiple targets, Warehouse Builder generates code with a cursor for each target.

Committing Data from a Single Source to Multiple Targets

If you want to populate multiple targets based on a common source, you may also want to ensure that every row from the source is represented correctly across all targets.

Figure 11-7 shows a mapping that illustrates this case. The target tables all depend upon the source table. For every row added to TARGET_1, one row should be added to the other targets. If this relationship is not maintained when you reload data, the data becomes inaccurate and possibly unusable.

Figure 11-7 Mapping with Multiple Targets Dependent on One Source

Surrounding text describes Figure 11-7 .

If the number of rows from SOURCE is relatively small, maintaining the three targets may not be difficult. Manually maintaining targets dependent on a common source, however, becomes more tedious as you increase the number of rows from the source, or as you design more complex mappings with more targets and transformations.

To use Warehouse Builder to ensure that every row in the source is properly represented in every target, configure the mapping to use a correlated commit strategy.

Commit Strategies

When you assign a commit strategy for a mapping in Warehouse Builder, you determine the commit and rollback behaviors for the multiple targets dependent on a single source. You can choose between the following strategies:

  • Independent Commit: Warehouse Builder commits and rolls back each target separately and independently of other targets. Use the independent commit when the consequences of targets being loaded unequally are not great or are irrelevant.

  • Correlated Commit: Warehouse Builder considers all targets collectively and commits or rolls back data uniformly across all targets. Use the correlated commit when it is important to ensure that every row in the source impacts all affected targets uniformly.

The combination of the commit strategy and operating mode determines mapping behavior. Table 11-1 shows the valid combinations you can select.

Table 11-1 Valid Commit Strategies for Operating Modes

Operating Mode Correlated Commit Independent Commit

Set based

Valid

Valid

Row based

Valid

Valid

Row based (target only)

Not Recommended

Valid


Correlated commit is not recommended for row based (target only). By definition, this operating mode places the cursor as close to the target as possible. In most cases, this results in only one target for each select statement and negates the purpose of committing data to multiple targets. If you design a mapping with the row based (target only) and correlated commit combination, Warehouse Builder runs the mapping but, in most cases, does not perform the correlated commit.

To understand the affects each operating mode and commit strategy combination has on a mapping, consider the mapping from Figure 11-7. Assume the data from source table equates to 1,000 new rows. When the mapping runs successfully, Warehouse Builder loads 1,000 rows to each of the targets. If the mapping fails to load the 100th new row to Target_2, you can expect the following results, ignoring the influence from other configuration settings such as Commit Frequency and Number of Maximum Errors:

  • Set based/ Correlated Commit: A single error anywhere in the mapping triggers the rollback of all data. When Warehouse Builder encounters the error inserting into Target_2, it reports an error for the table and does not load the row. Warehouse Builder rolls back all the rows inserted into Target_1 and does not attempt to load rows to Target_3. No rows are added to any of the target tables. For error details, Warehouse Builder reports only that it encountered an error loading to Target_2.

  • Row based/ Correlated Commit: Beginning with the first row, Warehouse Builder evaluates each row separately and loads it to all three targets. Loading continues in this way until Warehouse Builder encounters an error loading row 100 to Target_2. Warehouse Builder reports the error and does not load the row. It rolls back the row 100 previously inserted into Target_1 and does not attempt to load row 100 to Target_3. Next, Warehouse Builder continues loading the remaining rows, resuming with loading row 101 to Target_1. Assuming Warehouse Builder encounters no other errors, the mapping completes with 999 new rows inserted into each target. The source rows are accurately represented in the targets.

  • Set based/ Independent Commit: When Warehouse Builder encounters the error inserting into Target_2, it does not load any rows and reports an error for the table. It does, however, continue to insert rows into Target_3 and does not roll back the rows from Target_1. Assuming Warehouse Builder encounters no other errors, the mapping completes with one error message for Target_2, no rows inserted into Target_2, and 1,000 rows inserted into Target_1 and Target_3. The source rows are not accurately represented in the targets.

  • Row based/Independent Commit: Beginning with the first row, Warehouse Builder evaluates each row separately for loading into the targets. Loading continues in this way until Warehouse Builder encounters an error loading row 100 to Target_2 and reports the error. Warehouse Builder does not roll back row 100 from Target_1, does insert it into Target_3, and continues to load the remaining rows. Assuming Warehouse Builder encounters no other errors, the mapping completes with 999 rows inserted into Target_2 and 1,000 rows inserted into each of the other targets. The source rows are not accurately represented in the targets.

Using the Correlated Commit Strategy

Correlated commit may impact the size of your rollback segments. Space for rollback segments may be a concern when you merge data (insert/update or updated/insert) in set based mode.

Correlated commit operates transparently with PL/SQL bulk processing code.

When you use Correlated Commit to ensure all source rows are represented in multiple targets, consider using the Set based fail over to row based operating mode. In this way, you can find a balance between performance and error handling.

The correlated commit strategy is not available for mappings run in any mode that are configured for Partition Exchange Loading or include an Advanced Queue, Match-Merge, or Table Function operator.

Avoiding Invalid Designs for PL/SQL Mappings

Warehouse Builder generates code for PL/SQL mappings that meet the following criteria:

  • The output code of each operator satisfies the input code requirement of its next downstream operator.

  • If the mapping contains an operator that generates only PL/SQL output, all downstream dataflow operators must also be implementable by PL/SQL. You can use SQL operators in such a mapping only after loading the PL/SQL output to a target.

As you design a mapping, you can evaluate its validity by taking note of the input and output code types for each operator in the mapping. For example, you can see that the mapping in Figure 11-8 is invalid because the Match-Merge operator MM generates PL/SQL output but the subsequent Join operator accepts SQL input only.

Figure 11-8 Mapping Violates Input Requirement for Join Operator

Surrounding text describes Figure 11-8 .

To achieve the desired results for the mapping, consider joining the source tables before performing the Match-Merge, as shown in Figure 11-9, or loading the results from the Match-Merge to a staging table before performing the join, as shown in Figure 11-10.

Figure 11-9 Valid Mapping Design with Sources Joined Before Match-Merge

Surrounding text describes Figure 11-9 .

Figure 11-10 Valid Mapping Design with Staging Table

Surrounding text describes Figure 11-10 .

Table 11-2 and Table 11-3 list the implementation types for each Warehouse Builder operator. These tables also indicate whether or not PL/SQL code includes the operation associated with the operator in the cursor. This information is relevant in determining which operating modes are valid for a given mapping design. It also determines what auditing details are available during error handling.

Table 11-2 Source-Target Operators Implementation in PL/SQL Mappings

Operator Implementation Types Valid in Set Based Mode Valid in Row Based Mode Valid in Row Based (Target Only)

Source Operators: Tables, Cubes, Views, External Tables.

SQL

Yes.

Yes.

Yes. Part of cursor.

Target Operators: Tables, Cubes, Views,

SQL

PL/SQL

Yes, except when loading =UPDATE or DELETE.

Yes.

Yes. Not part of cursor.

Flat File as source

For PL/SQL, create External Table.

Yes.

Yes.

Yes. Part of the cursor.

Flat File as target

SQL

Yes, but not targets with loading =UPDATE or DELETE.

Yes.

Yes. Not part of cursor.

Advanced Queue as source

SQL

Yes.

Yes.

Yes, part of cursor.

Advanced Queue as target

SQL

Yes, but not targets with loading =UPDATE or DELETE.

Yes.

Yes. Not part of cursor

Sequence as source

SQL

Yes.

Yes.

Yes, part of cursor.


Table 11-3 Data Flow Operator Implementation in PL/SQL Mappings

Operator Name Implementation Types Valid in Set Based Mode Valid in Row Based Mode Valid in Row Based (Target Only) Mode

Aggregators

SQL

Yes.

Yes, only if part of the cursor.

Yes, only if part of the cursor.

Constant Operator

PL/SQL

SQL

Yes.

Yes.

Yes.

Data Generator

SQL*Loader Only

N/A

N/A

N/A

Deduplicator

SQL

Yes.

Yes, only if part of the cursor.

Yes, only if part of the cursor.

Expression

SQL

PL/SQL

Yes.

Yes.

Yes.

Filter

SQL

PL/SQL

Yes.

Yes.

Yes.

Joiner

SQL

Yes.

Yes, only if part of the cursor.

Yes, only if part of the cursor.

Key Lookup

SQL

Yes.

Yes, only if part of the cursor.

Yes, only if part of the cursor.

Mapping Input Parameter

SQL

PL/SQL

Yes.

Yes.

Yes.

Mapping Output Parameter

SQL

PL/SQL

Yes.

Yes.

Yes.

Match-Merge

SQL input

PL/SQL output

(PL/SQL input from XREF group only)

No.

Yes.

Yes. Not part of cursor.

Name-Address

PL/SQL

No.

Yes.

Yes. Not part of cursor.

Pivot

SQL

PL/SQL

Yes.

Yes.

Yes.

Post-Mapping Process

Irrelevant

Yes, independent of dataflow.

Yes.

Yes.

Pre-Mapping Process

Irrelevant

Yes, independent of dataflow.

Yes.

Yes.

Set

SQL

Yes.

Yes, only if part of the cursor.

Yes, only if part of the cursor.

Sorter

SQL

Yes.

Yes, only if part of the cursor.

Yes, as part of the cursor.

Splitter

SQL

PL/SQL

Yes.

Yes.

Yes.

Table Function

SQL or PL/SQL input

SQL output only

Yes.

Yes.

Yes.

Transformation as a procedure

PL/SQL

No.

Yes.

Yes. Not part of cursor.

Transformation as a function that does not perform DML

SQL

PL/SQL

Yes.

Yes.

Yes, included in the cursor.


Using Partition Exchange Loading

Data partitioning can improve performance when loading or purging data in a target system. This performance feature is referred to as Partition Exchange Loading (PEL).

PEL is recommended when loading a relatively small amount of data into a target containing a much larger volume of historical data. The target can be a table, a dimension, or a cube in a data warehouse.

This section includes the following topics:

About Partition Exchange Loading

By manipulating partitions in your target system, you can use Partition Exchange Loading (PEL) to instantly add or delete data. When a table is exchanged with an empty partition, new data is added.

You can use PEL to load new data by exchanging it into a target table as a partition. For example, a table that holds the new data assumes the identity of a partition from the target table and this partition assumes the identity of the source table. This exchange process is a DDL operation with no actual data movement. Figure 11-11 illustrates this example.

Figure 11-11 Overview of Partition Exchange Loading

Surrounding text describes Figure 11-11 .

In Figure 11-11, data from a source table Source is inserted into a target table consisting of four partitions (Target_P1, Target_P2, Target_P3, and Target_P4). If the new data needs to be loaded into Target_P3, the partition exchange operation only exchanges the names on the data objects without moving the actual data. After the exchange, the formerly labeled Source is renamed to Target_P3, and the former Target_P3 is now labeled as Source. The target table still contains four partitions: Target_P1, Target_P2, Target_P3, and Target_P4. The partition exchange operation available in Oracle Database completes the loading process without data movement.

Configuring a Mapping for PEL

To configure a mapping for partition exchange loading, complete the following steps:

  1. In the navigation tree, right-click a mapping and select Configure.

    Warehouse Builder displays the Configuration Properties window as shown in Figure 11-12.

    Figure 11-12 PEL Configuration Properties

    Surrounding text describes Figure 11-12 .
  2. By default, PEL is disabled for all mappings. Set PEL Enabled to True to use Partition Exchange Loading.

  3. Use Data Collection Frequency to specify the amount of new data to be collected for each run of the mapping. Set this parameter to specify if you want the data collected by Year, Quarter, Month, Day, Hour, or Minute. This determines the number of partitions.

  4. Set Direct to True if you want to create a temporary table to stage the collected data before performing the partition exchange. If you set this parameter to False, Warehouse Builder directly swaps the source table into the target table as a partition without creating a temporary table. For more information, see "Direct and Indirect PEL".

  5. If you set Replace Data to True, Warehouse Builder replaces the existing data in the target partition with the newly collected data. If set to False (default), Warehouse Builder preserves the existing data in the target partition. The new data is inserted into a non-empty partition. This parameter affects the local partition and can be used to remove or swap a partition out of a target table. At the table level, you can set Truncate/Insert properties.

Direct and Indirect PEL

When you use Warehouse Builder to load a target by exchanging partitions, you can load the target indirectly or directly.

  • Indirect PEL: By default, Warehouse Builder creates and maintains a temporary table that stages the source data before initiating the partition exchange process. For example, use Indirect PEL when the mapping includes a remote source or a join of multiple sources.

  • Direct PEL: You design the source for the mapping to match the target structure. For example, use Direct PEL in a mapping to instantaneously publish fact tables that you loaded in a previously executed mapping.

Using Indirect PEL

If you design a mapping using PEL and it includes remote sources or a join of multiple sources, Warehouse Builder must perform source processing and stage the data before partition exchange can proceed. Therefore, configure such mappings with Direct PEL set to False. Warehouse Builder transparently creates and maintains a temporary table that stores the results from source processing. After performing the PEL, Warehouse Builder drops the table.

Figure 11-13 shows a mapping that joins two sources and performs an aggregation. If all new data loaded into the ORDER_SUMMARY table is always loaded into same partition, then you can use Indirect PEL on this mapping to improve load performance. In this case, Warehouse Builder transparently creates a temporary table after the Aggregator and before ORDER_SUMMARY.

Figure 11-13 Mapping with Multiple Sources

Surrounding text describes Figure 11-13 .

Warehouse Builder creates the temporary table using the same structure as the target table with the same columns, indexes, and constraints. For the fastest performance, Warehouse Builder loads the temporary table using parallel direct-path loading INSERT. After the INSERT, Warehouse Builder indexes and constrains the temporary table in parallel.

Example: Using Direct PEL to Publish Fact Tables

Use Direct PEL when the source table is local and the data is of good quality. You must design the mapping such that the source and target are in the same database and have exactly the same structure. The source and target must have the same indexes and constraints, the same number of columns, and the same column types and lengths.

For example, assume that you have the same mapping from Figure 11-13 but would like greater control on when data is loaded into the target. Depending on the amount of data, it could take hours to load and you would not know precisely when the target table would be updated.

To instantly load data to a target using Direct PEL:

  1. Design one mapping to join source data, if necessary, transform data, ensure data validity, and load it to a staging table. Do not configure this mapping to use PEL.

    Design the staging table to exactly match the structure of the final target that you will load in a separate mapping. For example, the staging table in Figure 11-13 is ORDER_SUMMARY and should be of the same structure as the final target, ORDER_CUBE in Figure 11-14.

  2. Create a second mapping that loads data from the staging table to the final target such as shown in Figure 11-14. Configure this mapping to use Direct PEL.

    Figure 11-14 Publish_Sales_Summary Mapping

    Surrounding text describes Figure 11-14 .
  3. Use either the Warehouse Builder Process Flow Editor or Oracle Workflow to launch the second mapping after the completion of the first.

Using PEL Effectively

You can use PEL effectively for scalable loading performance if the following conditions are true:

  • Table partitioning and tablespace: The target table must be partitioned by one DATE column. All partitions must be created in the same tablespace. All tables are created in the same tablespace.

  • Existing historical data: The target table must contain a huge amount of historical data. An example use for PEL is for a click stream application where the target collects data every day from an OLTP database or Web log files. New data is transformed and loaded into the target that already contains historical data.

  • New data: All new data must to be loaded into the same partition in a target table. For example, if the target table is partitioned by day, then the daily data should be loaded into one partition.

  • Loading Frequency: The loading frequency should be equal to or less than the data collection frequency.

  • No global indexes: There must be no global indexes on the target table.

Configuring Targets in a Mapping

To configure targets in a mapping for PEL:

Step 1: Create All Partitions

Warehouse Builder does not automatically create partitions during runtime. You must create all partitions before you can use PEL. See "The new index name displays under Indexes in the Configuration Properties dialog.".

For example, if you select Month as the frequency of new data collection, you need to create all the required partitions for each month of new data. Use the Configuration Properties window to create partitions for a table, dimension, or cube. Figure 11-15 shows the property inspector window for table ORDER_SUMMARY. This figure shows six partitions that have been added for this table.

To use PEL, all partition names must follow a naming convention. For example, for a partition that will hold data for May 2002, the partition name must be in the format Y2002_Q2_M05.

For PEL to recognize a partition, its name must fit one of the following formats.

Ydddd

Ydddd_Qd

Ydddd_Qd_Mdd

Ydddd_Qd_Mdd_Ddd

Ydddd_Qd_Mdd_Ddd_Hdd

Ydddd_Qd_Mdd_Ddd_Hdd_Mdd

Where d represents a decimal digit. All the letters must be in upper case. Lower case is not recognized.

Figure 11-15 Configuration Properties for Table ORDER_SUMMARY

Surrounding text describes Figure 11-15 .

If all partitions are added with correct names, Warehouse Builder automatically computes the Value Less Than property for each partition. Otherwise, the Value Less Than is must be manually configured for each partition in order for Warehouse Builder to generate a DDL statement. The following is an example of a DDL statement generated by Warehouse Builder:

. . .
PARTITION A_PARTITION_NAME 
      VALUES LESS THAN (TO_DATE('01-06-2002','DD-MM-YYYY')),
. . .

Figure 11-16 shows automatically generated configuration values for the Value Less Than parameter.

Figure 11-16 Automatically Generated "Value Less Than" Setting

Surrounding text describes Figure 11-16 .

Step 2: Create All Indexes Using the LOCAL Option

Figure 11-17 shows an index (ORDER_SUMMARY_PK_IDX) added to the ORDER_SUMMARY table. This index has two columns, ORDER_DATE and ITEM_ID. Configure the following:

  • Set the Index Type parameter to UNIQUE.

  • Set the Local Index parameter to True.

Now Warehouse Builder can generate a DDL statement for a unique local index on table ORDER_SUMMARY.

Using local indexes provides the most important PEL performance benefit. Local indexes require all indexes to be partitioned in the same way as the table. When the temporary table is swapped into the target table using PEL, so are the identities of the index segments.

Figure 11-17 Configure an Index as a Local Index

Surrounding text describes Figure 11-17 .

If an index is created as a local index, the Oracle server requires that the partition key column must be the leading column of the index. In the preceding example, the partition key is ORDER_DATE and it is the leading column in the index ORDER_SUMMARY_PK_IDX.

Step 3: Primary/Unique Keys Use "USING INDEX" Option

In this step you must specify that all primary key and unique key constraints are created with the USING INDEX option. Figure 11-18 shows an example where the primary key constraint ORDER_SUMMARY_PK on the ORDER_SUMMARY table is specified with the USING INDEX option.

With the USING INDEX option, a constraint will not trigger automatic index creation when it is added to the table. The server will search existing indexes for an index with same column list as that of the constraint. Thus, each primary or unique key constraint must be backed by a user-defined unique local index. The index required by the constraint ORDER_SUMMARY_PK is ORDER_SUMMARY_PK_IDX which was created in "Step 2: Create All Indexes Using the LOCAL Option".

Figure 11-18 Specify a Constraint with USING INDEX option

Surrounding text describes Figure 11-18 .

Restrictions for Using PEL in Warehouse Builder

These are the restrictions for using PEL in Warehouse Builder:

  • Only One Date Partition Key: Only one partition key column of DATE data type is allowed. Numeric partition keys are not supported in Warehouse Builder.

  • Only Natural Calendar System: The current PEL method supports only the natural calendar system adopted worldwide. Specific business calendar systems with user-defined fiscal and quarter endings are currently not supported.

  • All Data Partitions Must Be In The Same Tablespace: All partitions of a target (table, dimension, or cube) must be created in the same tablespace.

  • All Index Partitions Must Be In The Same Tablespace: All indexes of a target (table, dimension, or cube) must be created in the same tablespace. However, the index tablespace can be different from the data tablespace.

Configuring Process Flows

To configure a process flow:

  1. Navigate to a process flow package, right-click a process flow and select Configure.

    Warehouse Builder displays the Configuration Properties sheet for the process flow as shown in Figure 11-19.

    Figure 11-19 Process Flow Configuration Properties Sheet

    Surrounding text describes Figure 11-19 .
  2. Expand Execution Settings and set the property Use Return as Status.

    This setting governs the behavior for activities that return NUMBER in their output. These activities include the FTP, External Process, and Transformation activities. When you set Use Return as Status to true, the Process Flow Editor assigns the outgoing transition conditions based on the following numerical return values for the activity:

    1 = Success Transition

    2 = Warning Transition

    3 = Error Transition

  3. Expand Path Settings and set the following properties for each activity in the process flow:

    Execution Location: The location from which this activity is executed. If you configured Oracle Enterprise Manager, you can select an OEM agent to execute the process flow.

    Remote Location: The remote location for FTP activities only.

    Working Location: The working location for FTP, FILE EXISTS and External Process activities only.

    Deployed Location: The deployment location. This setting applies to transformation activities only. For activities referring to pre-defined transformations, you must change the setting from Use Default Location and specify a valid location.

  4. Expand General Properties. You can view the bound name which is the name of the object that the activity represents in the process flow. Only mapping, transformation, and subprocess activities have bound names.