Skip Navigation Links | |
Exit Print View | |
Oracle Java CAPS Data Integrator User's Guide Java CAPS Documentation |
Designing Data Integrator Projects
Extracting, Transforming, Loading: ETL
Oracle Java CAPS Data Integrator Overview
Extracting, Transforming, and Loading: ETL
Oracle Java CAPS Data Integrator Methodology
Oracle Java CAPS Data Integrator Features
Oracle Java CAPS Data Integrator Architecture
Oracle Java CAPS Data Integrator Design-Time Components
Oracle Java CAPS Data Integrator Project System
Data Integrator Service Engine
Creating Oracle Java CAPS Data Integrator Projects
Connecting to Source and Target Databases
Connecting to a JDBC-Compliant Database
Creating and Connecting to Data Mashup Services
Virtual Database Table Metadata Options
Virtual Database Column Properties
Creating a New Data Integrator Project
Creating an ETL Collaboration Using the Wizard
Creating a Basic ETL Collaboration
To Create a Basic ETL Collaboration
Creating an Advanced ETL Collaboration
To Create an Advanced ETL Collaboration
Creating an ETL Collaboration for a Master Index Staging Database
To Create an ETL Collaboration for a Master Index Staging Database
Creating a Bulk Loader ETL Collaboration
To Create a Bulk Loader ETL Collaboration
Configuring ETL Collaborations
To Join Source Tables During Mapping
Defining Extraction Conditions and Validations
To Define Extraction Conditions and Validation.
Adding Tables to an Existing Collaboration
To Add Tables to a Collaboration
Forcing Execution Strategies for Collaborations
To Force Execution Strategies for Collaborations
Changing the Database URL for Design Time
To Change the Database URL for Design Time
Configuring Source Table Properties
To Configure Source Table Properties
Configuring Target Table Properties
To Configure Target Table Properties
Using Pre-Created Temporary Staging Tables
Using Temporary Staging Tables
Viewing Runtime Output Arguments
To View Runtime Output Arguments
Filtering Source Data Using Runtime Inputs
To Filter Source Data Using Runtime Inputs
Setting the Batch Size for Joined Tables
To Set the Batch Size for Joined Tables
Using Table Aliases with Multiple Source Table Views
Once you have created an ETL collaboration using the Data Integrator Wizard, you can configure and customize the collaboration as needed to meet your data processing requirements. The following topics provide information about ETL collaborations and the ETL Collaboration Editor.
Oracle Java CAPS Data Integrator automatically employs the most optimum execution strategy for collaborations. The strategy that is employed depends on the specific nature of the collaboration. If you do not want Data Integrator to determine the best execution strategy, you can configure a collaboration for either the staging or the pipeline execution strategy depending on what your Collaboration is set up to do. For example, if your collaboration business logic contains Java operators, you can only use the pipeline strategy. The following describes the criteria that Data Integrator uses to determine the best execution strategy, and these are also the criteria to use when you force an execution strategy.
Execution Strategy Selection
The execution strategy configuration for collaborations is set to the Best Fit option by default. When Data Integrator determines what execution strategy to use for a collaboration, it evaluates the collaboration for specific attributes. For example, it takes tables and columns into consideration. In addition, the selected execution strategy depends on whether a collaboration contains Java operators, which are operators that are not available across all supported databases. Examples of Java operators are date transformation operators and operators that parse business names and addresses and that normalize names. Projects with Java operators must be executed with a pipeline strategy.
You can force the execution strategy for a collaboration by changing its setting from Best Fit to Staging or Pipeline, as described in "Forcing Execution Strategies for Collaborations” .
Data Integrator uses the following execution methods depending on a collaboration's attributes:
With the direct/simple execution strategy, all extraction, transformation, and loading happens in a single database. When the Best Fit option is enabled, Data Integrator uses this strategy under the following conditions:
All source tables and target tables reside in the same database.
No Java operators are used.
The data validation condition is not used.
With one pass execution, extraction and transformation occur in the source database. When the Best Fit option is enabled, Data Integrator uses this strategy under the following conditions:
All source tables are in the same database.
No Java operators are used.
The data validation condition is not used.
With the staging execution strategy, all source tables are extracted from source databases and staged in temporary tables in the target database. Join and Transformation happens in the target database. This setting is used automatically when the Best Fit option is enabled and the conditions below occur. You can also select this option manually to force its use, in which case this execution strategy is recommended under the following conditions:
Source tables are scattered across different databases.
No Java operators are used.
The data validation condition is not used.
With the Pipeline execution strategy, transformation and loading (indirectly to the target database table) occurs in the internal database engine. This setting is used automatically when the Best Fit option is enabled and the conditions below occur. You can also select this option manually to force its use, in which case this execution strategy is recommended under the following conditions:
All tables are flat file database tables.
Java operators are used.
The data validation condition is used.
Oracle Java CAPS Data Integrator handles whitespace differently depending on the execution strategy. When joining a flat file table and an Oracle table where the comparison column in the Oracle table contains whitespace, refer to the table below.
Table 1 Execution Strategies for Flat File and Oracle Table Joins
|
The join condition specified on source tables is an explicit join. The condition specified on target tables is an implicit join. The target condition is used differently in insert and update statements. For update statements, the condition from the target table is used to identify the proper rules to update and match the rules to the target. For insert statements, the condition from the target table is used to verify that no duplicate rules are inserted.
The Staging Table Name property is used for the staging execution strategy. When you use the staging strategy and specify a staging table name for each target table, the ETL collaboration does not create a temporary staging table for the source data. Instead the table in the default table space for the target database with the name specified for this property is used for staging.
When all the source tables in an ETL collaboration are configured with a valid table name in the Staging Table Name property, the ETL process does not create or delete any temporary tables at runtime. Also, the process does not modify or alter the target tables other than for updating records as per the ETL collaboration.
Be careful about changing the default settings. By default, the staging table are dropped after each run. If you do not want to drop the tables, you need to change the Drop Staging Table property to false. Also by default, the data in a temporary table is truncated before each run. If you do not want the table truncated, set the Drop Staging Table property to false.
Data Integrator provides operators to validate extracted data. You can validate multiple columns in a record through Data Validation Conditions. If the validation fails for at least one of the columns then the record is rejected, preventing it from being loaded into target tables. All windows that show conditions (for example, the Data Validation Condition window and the Extraction Condition window) provide the operators to enable you to model complex validation conditions. You can view rejected rows at design time. If a data validation condition is set, click Run Collaboration to see if any records fail validation. If rejected rows exist, right-click the target table and select Show Rejected Data. The rejected data displays in the Output pane.
You use the ETL Collaboration Editor to create the business logic for ETL processes. The table below describes the ETL Collaboration Editor toolbar.
Table 2 ETL Collaboration Editor Toolbar
|