JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Java CAPS Data Integrator User's Guide     Java CAPS Documentation
search filter icon
search icon

Document Information

Designing Data Integrator Projects

About Data Integrator

Extracting, Transforming, Loading: ETL

Oracle Java CAPS Data Integrator Overview

Extracting, Transforming, and Loading: ETL

Oracle Java CAPS Data Integrator Methodology

Oracle Java CAPS Data Integrator Features

Oracle Java CAPS Data Integrator Architecture

Oracle Java CAPS Data Integrator Design-Time Components

Data Integrator Editor

Oracle Java CAPS Data Integrator Project System

Data Integrator Service Engine

ETL Engine

ETL Service Engine

Data Integrator Monitor

Data Integrator Recovery

Creating Oracle Java CAPS Data Integrator Projects

Connecting to Source and Target Databases

Connecting to a JDBC-Compliant Database

Creating and Connecting to Data Mashup Services

Virtual Database Table Metadata Options

Virtual Database Column Properties

Creating a New Data Integrator Project

To Create a New Project

Creating an ETL Collaboration Using the Wizard

Creating a Basic ETL Collaboration

To Create a Basic ETL Collaboration

Creating an Advanced ETL Collaboration

To Create an Advanced ETL Collaboration

Creating an ETL Collaboration for a Master Index Staging Database

To Create an ETL Collaboration for a Master Index Staging Database

Creating a Bulk Loader ETL Collaboration

To Create a Bulk Loader ETL Collaboration

ETL Collaboration Overview

Execution Strategies

Direct/Simple Execution Strategy

One Pass Execution Strategy

Staging Execution Strategy

Pipeline Execution Strategy

Whitespace Considerations

Explicit and Implicit Joins

Runtime Properties

Data Validation Conditions

About the ETL Collaboration Editor

Configuring ETL Collaborations

Joining Source Tables

To Join Source Tables

To Join Source Tables During Mapping

Modifying an Existing Join

To Join Source Tables

Defining Extraction Conditions and Validations

To Define Extraction Conditions and Validation.

Adding Tables to an Existing Collaboration

To Add Tables to a Collaboration

Forcing Execution Strategies for Collaborations

To Force Execution Strategies for Collaborations

Changing the Database URL for Design Time

To Change the Database URL for Design Time

Configuring Source Table Properties

To Configure Source Table Properties

Configuring Target Table Properties

To Configure Target Table Properties

Using Pre-Created Temporary Staging Tables

Using Temporary Staging Tables

Viewing Table or Join Data

To View Table or Join Data

Viewing the SQL Code

To View SQL Code

Viewing Runtime Output Arguments

To View Runtime Output Arguments

Fine-Tuning the ETL Process

Filtering Source Data Using Runtime Inputs

To Filter Source Data Using Runtime Inputs

Setting the Batch Size for Joined Tables

To Set the Batch Size for Joined Tables

Using Table Aliases with Multiple Source Table Views

Grouping Input Data

To Group Input Data

Viewing and Modifying Table Data

To View and Modify Table Data

ETL Collaboration Overview

Once you have created an ETL collaboration using the Data Integrator Wizard, you can configure and customize the collaboration as needed to meet your data processing requirements. The following topics provide information about ETL collaborations and the ETL Collaboration Editor.

Execution Strategies

Oracle Java CAPS Data Integrator automatically employs the most optimum execution strategy for collaborations. The strategy that is employed depends on the specific nature of the collaboration. If you do not want Data Integrator to determine the best execution strategy, you can configure a collaboration for either the staging or the pipeline execution strategy depending on what your Collaboration is set up to do. For example, if your collaboration business logic contains Java operators, you can only use the pipeline strategy. The following describes the criteria that Data Integrator uses to determine the best execution strategy, and these are also the criteria to use when you force an execution strategy.

Execution Strategy Selection

The execution strategy configuration for collaborations is set to the Best Fit option by default. When Data Integrator determines what execution strategy to use for a collaboration, it evaluates the collaboration for specific attributes. For example, it takes tables and columns into consideration. In addition, the selected execution strategy depends on whether a collaboration contains Java operators, which are operators that are not available across all supported databases. Examples of Java operators are date transformation operators and operators that parse business names and addresses and that normalize names. Projects with Java operators must be executed with a pipeline strategy.

You can force the execution strategy for a collaboration by changing its setting from Best Fit to Staging or Pipeline, as described in "Forcing Execution Strategies for Collaborations” .

Data Integrator uses the following execution methods depending on a collaboration's attributes:

Direct/Simple Execution Strategy

With the direct/simple execution strategy, all extraction, transformation, and loading happens in a single database. When the Best Fit option is enabled, Data Integrator uses this strategy under the following conditions:

One Pass Execution Strategy

With one pass execution, extraction and transformation occur in the source database. When the Best Fit option is enabled, Data Integrator uses this strategy under the following conditions:

Staging Execution Strategy

With the staging execution strategy, all source tables are extracted from source databases and staged in temporary tables in the target database. Join and Transformation happens in the target database. This setting is used automatically when the Best Fit option is enabled and the conditions below occur. You can also select this option manually to force its use, in which case this execution strategy is recommended under the following conditions:

Pipeline Execution Strategy

With the Pipeline execution strategy, transformation and loading (indirectly to the target database table) occurs in the internal database engine. This setting is used automatically when the Best Fit option is enabled and the conditions below occur. You can also select this option manually to force its use, in which case this execution strategy is recommended under the following conditions:

Whitespace Considerations

Oracle Java CAPS Data Integrator handles whitespace differently depending on the execution strategy. When joining a flat file table and an Oracle table where the comparison column in the Oracle table contains whitespace, refer to the table below.

Table 1 Execution Strategies for Flat File and Oracle Table Joins

Strategy Specified
Description
Best Fit
Uses the staging execution strategy since the source tables are from different databases. The results will be the same as if staging was selected.
Staging
Data Integrator extracts source tables from source databases and stages the data in temporary tables in the target database. By default, whitespace is trimmed.
Pipeline
Data Integrator uses an internal database engine instead of temporary tables, accessing data directly from the source tables rather than extracting it to temporary tables. To avoid whitespace causing failure in the join condition, add LTRIM/RTRIM operators to the Oracle table column. The result will be the same as Staging/Best Fit.

Note - In future this feature will be made obsolete


Explicit and Implicit Joins

The join condition specified on source tables is an explicit join. The condition specified on target tables is an implicit join. The target condition is used differently in insert and update statements. For update statements, the condition from the target table is used to identify the proper rules to update and match the rules to the target. For insert statements, the condition from the target table is used to verify that no duplicate rules are inserted.

Runtime Properties

The Staging Table Name property is used for the staging execution strategy. When you use the staging strategy and specify a staging table name for each target table, the ETL collaboration does not create a temporary staging table for the source data. Instead the table in the default table space for the target database with the name specified for this property is used for staging.

When all the source tables in an ETL collaboration are configured with a valid table name in the Staging Table Name property, the ETL process does not create or delete any temporary tables at runtime. Also, the process does not modify or alter the target tables other than for updating records as per the ETL collaboration.

Be careful about changing the default settings. By default, the staging table are dropped after each run. If you do not want to drop the tables, you need to change the Drop Staging Table property to false. Also by default, the data in a temporary table is truncated before each run. If you do not want the table truncated, set the Drop Staging Table property to false.

Data Validation Conditions

Data Integrator provides operators to validate extracted data. You can validate multiple columns in a record through Data Validation Conditions. If the validation fails for at least one of the columns then the record is rejected, preventing it from being loaded into target tables. All windows that show conditions (for example, the Data Validation Condition window and the Extraction Condition window) provide the operators to enable you to model complex validation conditions. You can view rejected rows at design time. If a data validation condition is set, click Run Collaboration to see if any records fail validation. If rejected rows exist, right-click the target table and select Show Rejected Data. The rejected data displays in the Output pane.

About the ETL Collaboration Editor

You use the ETL Collaboration Editor to create the business logic for ETL processes. The table below describes the ETL Collaboration Editor toolbar.

Table 2 ETL Collaboration Editor Toolbar

Commands
Description
Source
Changes the editor display to show the Java source code of the collaboration.
Design
Changes the editor display to show a graphical representation of the collaboration.
Expand All Graph Objects
Expands the tables and graphical elements displayed on the editor to show all mapping elements and fields. This is the default view.
Collapse All Graph Objects
Collapses the tables and graphical elements displayed on the editor, making it easier to view the different components of the collaboration.
Toggle Output View
Toggles between a full-screen pane and a divided pane that shows output messages, like log entries, validations, source and target table data, SQL code, and rejected rows.
Drop and Recreate Tables
Drops all source and target database tables and then recreates them.
Refresh Metadata
Refreshes the metadata for the source and target tables.
Select Source and Target Tables
Enables you to select source and target tables to be used in the collaboration.
Create New Join
Launches the Create New Join View dialog box, where you can define source table relationships, or joins. The Create New Join View dialog box also appears when you do either of the following:
  • Map a specific source table column to a target table that is already mapped to a different source table, and the two source tables are not already joined.

  • Connect a specific source table column to an operator (such as concatenate) that is already connected to a different source table, and the two source tables are not already joined.

Edit Database Properties
Enables you to configure database OTDs to point the database URL to a different location for design time. This is a temporary setting for design time only; the setting is not saved with the OTD for runtime.
Add/Edit Runtime Inputs
Allows you to add input variables to the collaboration. Input variables that are assigned by an external system, such as through a business process, are called runtime inputs.

Add/Edit Runtime Outputs
Enables you to add runtime output variables to the collaboration.

Zoom In, Zoom Out, and Scale
Changes the scale of the objects in the collaboration. You can zoom in and out, and you can specify a scale percentage.
AutoLayout All Graph Objects
Automatically arranges all ETL Collaboration Editor window components.
Validate Collaboration
Validates the mapping logic without executing the project.
Run Collaboration
Executes the project and generates a message log that displays messages and errors if the execution fails.