24 ETL Objects Configuration

Earlier in the design phase, you defined a logical model for your target system using Warehouse Builder design objects. This chapter includes reference information for assigning physical properties to mappings and process flows. This chapter presents configuration properties in the order they appear in the Warehouse Builder user interface.

This chapter includes:

Configuring Mappings Reference
Configuring Process Flows

Configuring Mappings Reference

When you correctly configure mappings, you can improve the ETL performance. Use this section as a reference for setting configuration parameters that govern how Warehouse Builder loads data and optimizes code for better performance.

This section includes the following topics:

Procedure for Configuring Mappings
Runtime Parameters
Code Generation Options
Sources and Targets Reference

Procedure for Configuring Mappings

To configure physical properties for a mapping:

Right-click a mapping from the Project Explorer and select Configure from the pop-up menu.

Warehouse Builder displays the configuration Properties dialog for a mapping. You can configure properties for the mapping itself and the operators within the mapping.

Figure 24-1 displays the Configuration Properties dialog.

Figure 24-1 Configuration Properties Dialog For A Mapping

Description of "Figure 24-1 Configuration Properties Dialog For A Mapping"
Select Deployable to enable Warehouse Builder to generate a set of scripts for mapping entities marked as deployable.

Deployable is enabled by default. If you disable it, Warehouse Builder does not generate scripts for that mapping.
Set Language to the type of code you want to generate for the selected mapping.

The options from which you can choose depend upon the design and use of the operators in the mapping. Warehouse Builder sets the correct value depending on the mapping: PL/SQL, SQL*Loader, ABAP (for an SAP source mapping).
If you want to schedule the mapping to run based on a previously defined schedule, click Referred Calendar.

For instructions on creating and using schedules, see Chapter 28, "Scheduling ETL Objects".
Expand the Runtime Parameters to configure your mapping for deployment.

For a description of each runtime parameter, see "Runtime Parameters".
Expand Code Generation Options to enable performance options that optimize the code generated for the mapping.

For a description of each option, see "Code Generation Options".
Go to the node for each operator in the mapping to set their physical properties.

For information on configuring sources and targets in a mapping, see "Sources and Targets Reference". To configure mappings with flat file sources and targets, see "Configuring Flat File Operators".

Runtime Parameters

When you configure Runtime Parameters for a mapping, you set the default behaviors for the mapping. You can override these parameters when you execute the mapping either in the Control Center, the Process Flow Editor, or Oracle Enterprise Manager.

The Runtime Parameters include the following parameters:

Bulk Size
Analyze Table Sample Percentage
Commit Frequency
Maximum Number of Errors
Default Operating Mode
Default Audit Level
Default Purge Group

Bulk Size

Use Bulk Size to specify the number of rows in each bulk for PL/SQL Bulk Processing. Warehouse Builder uses the Bulk Size parameter only when Bulk Processing Code option is selected and the operating mode is set to row based. For more information, see the Oracle PL/SQL Reference Guide.

Analyze Table Sample Percentage

When you select the Analyze Table Statements option, Warehouse Builder estimates when gathering statistics on the target tables. After data is loaded into the target tables, statistics used for cost-based optimization are gathered on each target table. You can set this parameter to the percentage of rows in each target table used for this analysis.

Commit Frequency

Commit frequency applies only to non-bulk mode mappings. Bulk mode mappings commit according to the bulk size.

When you set the Default Operating Mode to row based and Bulk Processing Code to false, Warehouse Builder uses the Commit Frequency parameter when executing the package. Warehouse Builder commits data to the database after processing the number of rows specified in this parameter.

If you select the Bulk Processing Code option, set the Commit Frequency equal to the Bulk Size. If the two values are different, Bulk Size overrides the commit frequency and Warehouse Builder implicitly performs a commit for every bulk size.

Maximum Number of Errors

Use Maximum Number of Errors to indicate the maximum number of errors allowed when executing the package. Execution of the package terminates when the number of errors exceeds the maximum number of errors value.

Default Operating Mode

For mappings with a PL/SQL implementation, select a default operating mode. The operating mode you select can greatly affect mapping performance. For details on how operating modes affect performance, see "Set Based Versus Row Based Operating Modes". You can select one of the following operating modes:

Set based: Warehouse Builder generates a single SQL statement that inserts all data and performs all operations on the data. This increases the speed of Data Manipulation Language (DML) operations. Set based mode offers optimal performance but minimal auditing details.
Row based: Warehouse Builder generates statements that process data row by row. The select statement is a SQL cursor. All subsequent statements are PL/SQL. Since data is processed row by row, the row based operating mode has the slowest performance but offers the most auditing details.
Row based (Target Only): Warehouse Builder generates a cursor select statement and attempts to include as many operations as possible in the cursor. For each target, Warehouse Builder generates a PL/SQL insert statement and inserts each row into the target separately.
Set based fail over row based: Warehouse Builder executes the mapping in set based mode. If an error occurs, the execution fails and Warehouse Builder starts the mapping over again in the row based mode. This mode is recommended for use only in test environments and is not recommended for use in production environments.
Set based fail over row based (Target Only): Warehouse Builder first executes the mapping in set based mode. If an error occurs, the execution fails over to Row based (Target Only) mode. This mode is recommended for use only in test environments and is not recommended for use in production environments.

Default Audit Level

Use Default Audit Level to indicate the audit level used when executing the package. Audit levels dictate the amount of audit information captured in the runtime schema when the package is run. The audit level settings are:

None: No auditing information is recorded in runtime.
Statistics: Statistical auditing information is recorded in runtime.
Error Details: Error information and statistical auditing information is recorded in runtime.
Complete: All auditing information is recorded in runtime. Running a mapping with the audit level set to Complete generates a large amount of diagnostic data which may quickly fill the allocated tablespace.

Default Purge Group

The Default Purge Group is used when executing the package. Each audit record in the runtime schema is assigned to the purge group specified.

Code Generation Options

The Code Generation Options include the following:

ANSI SQL Syntax
Commit Control
Analyze Table Statements
Enable Parallel DML
Optimized Code
Authid
Use Target Load Ordering
Error Trigger
Bulk Processing Code
Generation Mode

ANSI SQL Syntax

If you select this option, Warehouse Builder generates ANSI SQL syntax. If the option is not selected, the Warehouse Builder generates Oracle SQL syntax.

Commit Control

Automatic: This is the default setting. Warehouse Builder loads and then automatically commits data based on the mapping design. This setting is valid for all mapping types. For multiple targets in a single mapping, Warehouse Builder commits data based on target by target processing (insert, update, delete).

Automatic Correlated: Automatic correlated commit is a specialized type of automatic commit that applies only to PL/SQL mappings with multiple targets. Warehouse Builder considers all targets collectively and commit or rolls back data uniformly across all targets.

The mapping behavior varies according to the operating mode you select. For more information about correlated commit, see "Committing Data from a Single Source to Multiple Targets".

Manual: Select manual commit control for PL/SQL mappings that you want to interject complex business logic or perform validations before committing data.

You have two options for specifying manual commits. You can define the commit logic within the mapping as described in "Embedding Commit Logic into the Mapping".

Alternatively, you can commit data in a process flow or from a SQL Plus session as described in "Committing Data Independently of Mapping Design".

Analyze Table Statements

If you select this option, Warehouse Builder generates code for analyzing the target table after the target is loaded, if the resulting target table is double or half its original size.

Enable Parallel DML

If you select this option, Warehouse Builder enables Parallel DML at runtime. Executing DML statements in parallel improves the response time of data-intensive operations in large databases that are present in a data warehouse.

Optimized Code

Select this option to improve performance for mappings that include the Splitter operator and inserts into multiple target tables. When this option is selected and the mapping is executed by Oracle9i or higher, Warehouse Builder generates a single SQL statement (multi_table_insert) that inserts data into multiple tables based on same set of source data.

Warehouse Builder performs the multiple table insert only if this parameter is selected and the Oracle target module database is Oracle9i or higher. The multiple tables insert is performed only for mappings in set based mode that include a Splitter operator and no active operators such as an Aggregator or Joiner operator between the Splitter and the target. Also, the multiple insert is available only for tables, not views, materialized views, dimensions, or cubes. Each target table must have fewer than 999 columns. For detailed instructions on how to create a mapping with multiple targets, see "Example: Creating Mappings with Multiple Targets".

Set this parameter to false for mappings run in row based mode or for mappings executed by Oracle8i server. You may also want to set this parameter to false when auditing is required. When this option is selected, Warehouse Builder returns one total SELECT and INSERT count for all targets.

Authid

Specifies the AUTHID option to be used while generating the code. The options you can select are Current_User, Definer, or None.

Use Target Load Ordering

For PL/SQL mappings with multiple targets, you can generate code that defines an order for loading the targets. This is important when a parent child relationship exists between two or more targets in a mapping. The option is selected by default.

Error Trigger

Specify the name of the error trigger procedure in this field.

Bulk Processing Code

If this configuration parameter is selected and the operating mode is set to row based, Warehouse Builder generates PL/SQL bulk processing code. PL/SQL bulk processing improves row-based ETL performance by collecting, processing, and writing rows in bulk, instead of doing it row by row. The size of each bulk is determined by the configuration parameter Bulk Size. Set based mode offers optimal performance, followed by bulk processing, and finally by row based mode. For more information, see the Oracle PL/SQL Reference Guide.

Generation Mode

By default, when Warehouse Builder generates code for a mapping, it generates the code for all possible operating modes. That is, if you set Default Operating Mode to set based, Warehouse Builder still generates code for all possible operating modes when Generation Mode is set to All Operating Modes. This enables you to switch the operating modes for testing purposes at runtime.

Sources and Targets Reference

For relational and dimensional sources and targets such as tables, views, and cubes, Warehouse Builder displays the following set of properties for each operator:

Use LCR APIs
Database Link
Location
Conflict Resolution
Schema
Partition Exchange Loading
Hints
Constraint Management
SQL*Loader Parameters

Use LCR APIs

By default, this setting is enabled and Warehouse Builder performs DML using LCR APIs if available. If no LCR APIs are available, Warehouse Builder uses the standard DML.

Database Link

This parameter is maintained for backward compatibility only.

In previous version of Warehouse Builder, you could select a database link by name from the drop-down list. Source operators can be configured for schemas and database links, but targets can be configured for schemas only. Sources and targets can reside in different schemas, but they must reside in the same database instance.

Location

This setting specifies the location Warehouse Builder uses to access the source or target operator.

Conflict Resolution

Enable this setting to detect and resolve any conflicts that may arise during DML using the LCR APIs.

Schema

This parameter is maintained for backward compatibility only.

In previous version of Warehouse Builder, you could link the mapping to a particular schema by clicking on the Schema field and typing a name.

Partition Exchange Loading

Use setting in this section to enable partition exchange loading (PEL) into a target table. For specific information on each of these settings and additional information on how to design mappings for PEL, see "Improved Performance Through Partition Exchange Loading".

Hints

Define loading or extraction hints. Application developers often develop insights into their data. For example, they know that a query runs much faster if a set of tables is joined in one order rather than another. Warehouse Builder can incorporate these insights into the generated SQL code packages as SQL Optimizer Hints.

When you select a hint from the Hints dialog, the hint appears in the Existing Hints field. Type additional text as appropriate in the Extra Text column. The editor includes the hint in the mapping definition as is. There is no validation or checking on this text.

You can define loading hints for mappings that load data in INSERT or UPDATE mode. By default, Warehouse Builder adds commonly used hints such as APPEND and PARALLEL. For all loading modes other than INSERT, the APPEND hint causes no effect and you can choose to remove it.

Hint is available during mapping configuration. To configure the hint:

In Project Explorer, expand Database folder.
In Database, expand the repository module.
In Repository module, expand Mappings.
In Mappings, select the required mapping.
Right-click Mapping and select Configure.
In Configuration Properties window, expand the required operator.
Select the operator function.
To open the Inline View Hint window, click the ellipses next to the Inline View Hint option available in the Configuration Properties window.

For information on optimizer hints and how to use them, see Oracle9i Designing and Tuning for Performance.

Constraint Management

Configure the following Constraint Management parameters:

Exceptions Table Name: All rows that violated their foreign key constraints during re-enabling are logged into the specified exceptions table. No automatic truncation of this table is performed either before or after the load. Constraint violations are also loaded into the runtime audit error tables.

For SQL and PL/SQL loading, if you do not specify an exceptions table, invalid rows load into a temporary table located in the default tablespace and then load into the Runtime Audit error table. The table is dropped at the end of the load.

If you are using SQL*Loader direct path loading, you must specify an exception table. Consult the SQL*Loader documentation for more information.
Enable Constraints: If you select this option, Warehouse Builder maintains the foreign key constraints on the target table before loading data. If this option is not selected, Warehouse Builder disables the foreign key constraints on target tables before data loading and then re-enables the constraints after loading. Constraint violations found during re-enable are identified in the runtime audit error table and, if specified, in an exceptions table.

When you disable constraints, loading time is decreased since constraint checking is not performed. However, if exceptions occur for any rows during re-enabling, the constraints for those rows will remain in a non-validated state. These rows are logged in the runtime audit error table by their ROWID. You must manually inspect the error rows to take any necessary corrective action.

Setting the Enable Constraints to false is subject to the following restrictions:
- For set based operating mode, the false setting disables foreign key constraints on the targets before loading, and then re-enables the constraints after loading. This property has no effect on foreign key constraints on other tables referencing the target table. If the load is done using SQL*Loader instead of a SQL or PL/SQL package, then a re-enable clause is added to the .ctl file.
- For set based fail over to row based and set based fail over to row based (target only) operating modes, the false setting disables the foreign key constraints on the targets before loading and then re-enables them if the load succeeds in set based mode. This setting has no effect on foreign keys referencing other tables. If the load fails over to row-based, then loading will repeat in row based mode and all constraints remain enabled.
  
  Note:
  Constraint violations created during re-enabling will not cause the load to fail from set based over to row based mode.
- For row based or row based (target only) operating modes, all foreign key constraints remain enabled even if the property is set to false.
- For the TRUNCATE/INSERT DML type, the false setting disables foreign key constraints on other tables referencing the target table before loading, and then re-enables the constraints after loading, regardless of the default operating mode.

SQL*Loader Parameters

When you have a table operator that contains inputs from a flat file, you need to configure the following SQL*Loader Parameters properties:

Partition Name: Indicates that the load is a partition-level load. Partition-level loading enables you to load one or more specified partitions or subpartitions in a table. Full database, user, and transportable tablespace mode loading does not support partition-level loading. Because incremental loading (incremental, cumulative, and complete) can be done only in full database mode, partition-level loading cannot be specified for incremental loads. In all modes, partitioned data is loaded in a format such that partitions or subpartitions can be selectively loaded.
Sorted Indexes Clause: Identifies the indexes on which the data is presorted. This clause is allowed only for direct path loads. Because data sorted for one index is not usually in the right order for another index, you specify only one index in the SORTED INDEXES clause. When the data is in the same order for multiple indexes, all indexes can be specified at once. All indexes listed in the SORTED INDEXES clause must be created before you start the direct path load.
Singlerow: Intended for use during a direct path load with APPEND on systems with limited memory, or when loading a small number of records into a large table. This option inserts each index entry directly into the index, one record at a time. By default, SQL*Loader does not use SINGLEROW to append records to a table. Index entries are stored in a temporary area and merged with the original index at the end of the load. Although this method achieves better performance and produces an optimal index, it requires extra storage space. During the merge, the original index, the new index, and the space for new entries all simultaneously occupy storage space. With the SINGLEROW option, storage space is not required for new index entries or for a new index. Although the resulting index may not be as optimal as a freshly sorted one, it takes less space to produce. It also takes more time because additional UNDO information is generated for each index insert. This option is recommended when the available storage is limited. It is also recommended when the number of records to be loaded is small compared to the size of the table. A ratio of 1:20 or less is considered small.
Trailing Nullcols: Sets SQL*Loader to treat any relatively positioned columns that are not present in the record as null columns.
Records To Skip: Invokes the SKIP command in SQL*Loader. SKIP specifies the number of logical records from the beginning of the file that should not be loaded. By default, no records are skipped. This parameter continues loads that have been interrupted for some reason. It is used for all conventional loads, for single-table direct loads, and for multiple-table direct loads when the same number of records are loaded into each table. It is not used for multiple-table direct loads when a different number of records are loaded into each table.
Database File Name: Specifies the names of the export files to import. The default extension is .dmp. Because you can export multiple export files, you may need to specify multiple filenames to be imported. You must have read access to the imported files. You must also have the IMP_FULL_DATABASE role.

Configuring Flat File Operators

The Configuration Properties dialog contains additional settings for Mapping Flat File operators, depending on how the operators are used in the mapping.

Flat File Operators as a Target: Warehouse Builder generates a PL/SQL deployment code package. For information on configuring the parameters associated with a Mapping Flat File operator used as a target, see "Flat File Operators as a Target".
Flat File Operator as a Source: Warehouse Builder generates SQL*Loader scripts. For information on the parameters associated with a Mapping Flat File operator used as a source, see "Flat File Operator as a Source".

Flat File Operators as a Target

To configure properties unique to mappings with flat file targets:

Select a mapping from the Project Explorer, select Design from the menu bar, and select Configure.

Or, right-click the mapping you want to configure and select Configure from the pop-up menu.

Warehouse Builder displays the Configuration Properties dialog.
Choose the parameters you want to configure and click the space to the right of the parameter name to edit its value.

For each parameter, you can either select an option from a list, type a value, or click the Ellipsis button to display another properties dialog.
Select the Deployable option to enable Warehouse Builder to generate a set of scripts for mapping objects marked as deployable. If this option is not selected for a mapping, Warehouse Builder does not generate scripts for that mapping.
Set Language to the type of code you want to generate for the selected mapping. The options you can choose from depend upon the design and use of the operators in the mapping. Depending on the mapping, you can select from PL/SQL, ABAP (for an SAP source mapping), or SQL*Loader.
Specify the location to deploy the mapping.
Under Runtime Parameters, set the Default Operating Mode to Row based (target only). This type of mapping will not generate code in any other default operating mode. For a description of each runtime parameter, see "Runtime Parameters".
Set the Code Generation Options as described in "Code Generation Options".
Set the Sources and Targets Reference as described in "Sources and Targets Reference".
For Access Specification, specify the name of the flat file target in Target Data File Name. For the Target Data File Location, specify a target file located on the machine where you installed the Warehouse Builder Runtime Platform. Select Output as XML file if you want the output to be in an xml file.

Flat File Operator as a Source

To configure a mapping with a flat file operator as a source:

Select a mapping from the Project Explorer, select Design from the menu bar, and select Configure. Or, right-click the mapping you want to configure and select Configure from the pop-up menu.

The Configuration Properties dialog appears as shown in Figure 24-2.

Figure 24-2 Configuration Properties Dialog (Flat File Source)

Description of "Figure 24-2 Configuration Properties Dialog (Flat File Source)"
Choose the parameters you want to configure and click the space to the right of the parameter name to edit its value.

For each parameter, you can specify whether you want the parameter to be selected, select an option from a list, type a value, or click the Ellipsis button to display another properties dialog.
Select the Deployable option to generate SQL*Loader script.
Specify the Log File Location and Log File Name.
Select Continue Load.

If SQL*Loader runs out of space for data rows or index entries, the load is discontinued. If Continue Load option is selected, Warehouse Builder attempts to continue discontinued loads.
In Nls Characterset, specify the character set to place in the CHARACTERSET clause.
Select Direct Mode to indicate that a direct path load will be done. If this option is not set, a conventional load will be done. In general, direct mode is faster.
Select Operation Recoverable to indicate that the load is recoverable. If this option is not selected, the load is not recoverable and records are not recorded in the redo log.
Configure the following parameters that affect the OPTIONS clause in the SQL *Loader scripts Warehouse Builder generates for mappings with flat file sources.

Perform Parallel Load: If this option is selected, direct loads can operate in multiple concurrent sessions.

Errors Allowed: If the value specified is greater than 0, then the ERRORS = n option is generated. SQL*Loader terminates the load at the first consistent point after it reaches this error limit.

Records To Skip: If the value specified is greater than 0, then the SKIP = n option is generated. This value indicates the number of records from the beginning of the file that should not be loaded. If the value is not specified, no records are skipped.

Records To Load: If the value specified is greater than 0, then the LOAD = n option will be generated. This value specifies the maximum number of records to load. If a value is not specified all of the records are loaded.

Rows Per Commit: If the value specified is greater than 0, then the ROWS = n option is generated. For direct path loads, the value identifies the number of rows to read from the source before a data is saved. For conventional path loads, the value specifies the number of rows in the bind array.

Read Size: If the value specified is greater than 0, then the READSIZE = n option is generated. The value is used to specify the size of the read buffer.

Bind Size: If the value specified is greater than 0, then the BINDSIZE = n option is generated. The value indicates the maximum size in bytes of the bind array.

Read Buffers: If the value specified is greater than 0, then the READBUFFERS n clause is generated. READBUFFERS specifies the number of buffers to use during a direct path load. Do not specify a value for READBUFFERS unless it becomes necessary.

Preserve Blanks: If this option is selected, then the PRESERVE BLANKS clause is generated. PRESERVE BLANKS retains leading white space when optional enclosure delimiters are not present. It also leaves the trailing white space intact when fields are specified with a predetermined size.

Database File Name: This parameter enables you to specify the characteristics of the physical files to be loaded. The initial values for these parameters are set from the properties of the flat file used in the mapping.

If this parameter is set to a non-blank value, then the FILE= option is generated. The value specified is enclosed in single quotes in the generated code.

Control File Location and Control File Name: The control file name necessary for audit details.

For more information on each SQL*Loader option and clause, see Oracle Database Utilities 10g.
Expand the Runtime Parameters to configure your mapping for deployment.

Audit: Select this option to perform an audit when the package is executed.

Default Purge Group: The Default Purge Group is used when executing the package. Each audit record in the runtime schema is assigned to the purge group specified.
Expand Sources and Targets Reference to set the physical properties of the operators in the mapping as described in "Sources and Targets Reference".

Configuring Process Flows

To configure a process flow module:

Right-click the process flow module and select Configure.

Warehouse Builder displays the Configuration Properties sheet for the process flow module.
Set the properties for Evaluation Location and Identification Location.

Evaluation Location is the location from which this process flow is evaluated.

Identification Location provides the location where the generated code will be deployed to.

To configure a process flow package:

Right-click the process flow package and select Configure.

Warehouse Builder displays the Configuration Properties sheet for the process flow module.
Set the properties for Referred Calendar and Generation Comments.

Referred Calendar provides the schedule to associate with this package.

Generation Comments provides additional comments for the generated code.

Click any of the activities of a package to view its properties.

Figure 24-3 displays the Configuration Properties of an activity.

Figure 24-3 Activities Configuration Properties Sheet

This illustration is described in the surrounding text.

Description of "Figure 24-3 Activities Configuration Properties Sheet "

Under Path Settings, set the following properties for each activity in the process flow:

Execution Location: The location from which this activity is executed. If you configured Oracle Enterprise Manager, you can select an OEM agent to execute the process flow.

Remote Location: The remote location for FTP activities only.

Working Location: The working location for FTP, FILE EXISTS and External Process activities only.

Deployed Location: The deployment location. This setting applies to transformation activities only. For activities referring to pre-defined transformations, you must change the setting from Use Default Location and specify a valid location.

Under General Properties, you can view the bound name which is the name of the object that the activity represents in the process flow. Only mapping, transformation, and subprocess activities have bound names.

Under Execution Settings, select the option Use Return as Status.

This setting governs the behavior for activities that return NUMBER in their output. These activities include the FTP, User Defined, and Transform activities. When you select Use Return as Status, the Process Flow Editor assigns the outgoing transition conditions based on the following numerical return values for the activity:

1 = Success Transition

2 = Warning Transition

3 = Error Transition