Skip Navigation Links | |
Exit Print View | |
![]() |
Oracle Java CAPS Data Integrator User's Guide Java CAPS Documentation |
Designing Data Integrator Projects
Extracting, Transforming, Loading: ETL
Oracle Java CAPS Data Integrator Overview
Extracting, Transforming, and Loading: ETL
Oracle Java CAPS Data Integrator Methodology
Oracle Java CAPS Data Integrator Features
Oracle Java CAPS Data Integrator Architecture
Oracle Java CAPS Data Integrator Design-Time Components
Oracle Java CAPS Data Integrator Project System
Data Integrator Service Engine
Creating Oracle Java CAPS Data Integrator Projects
Connecting to Source and Target Databases
Connecting to a JDBC-Compliant Database
Creating and Connecting to Data Mashup Services
Virtual Database Table Metadata Options
Virtual Database Column Properties
Creating a New Data Integrator Project
Creating an ETL Collaboration Using the Wizard
Creating a Basic ETL Collaboration
To Create a Basic ETL Collaboration
Creating an Advanced ETL Collaboration
To Create an Advanced ETL Collaboration
Creating an ETL Collaboration for a Master Index Staging Database
To Create an ETL Collaboration for a Master Index Staging Database
Creating a Bulk Loader ETL Collaboration
To Create a Bulk Loader ETL Collaboration
Direct/Simple Execution Strategy
About the ETL Collaboration Editor
Configuring ETL Collaborations
To Join Source Tables During Mapping
Defining Extraction Conditions and Validations
To Define Extraction Conditions and Validation.
Adding Tables to an Existing Collaboration
To Add Tables to a Collaboration
Forcing Execution Strategies for Collaborations
To Force Execution Strategies for Collaborations
Changing the Database URL for Design Time
To Change the Database URL for Design Time
Configuring Source Table Properties
To Configure Source Table Properties
Configuring Target Table Properties
To Configure Target Table Properties
Using Pre-Created Temporary Staging Tables
Using Temporary Staging Tables
Filtering Source Data Using Runtime Inputs
To Filter Source Data Using Runtime Inputs
Setting the Batch Size for Joined Tables
To Set the Batch Size for Joined Tables
Using Table Aliases with Multiple Source Table Views
Once you create an ETL collaboration using the Data Integrator Wizard, you can modify the collaboration to customize the processing logic. The ETL Collaboration Editor provides a variety of tools, commands, and operators to configure the ETL process.
Perform any of the following tasks to configure your ETL collaborations:
Data Integrator allows you to join data from multiple sources before extraction. You can create join views by creating a join condition that joins source tables.
Note - For optimal performance, join the most unique columns in the first join and the least unique columns in a second join.
The Create New Join View dialog box appears.
The tables move to the Selected Tables column, and the join is represented graphically in the Preview panel.
Inner – Use this if all tables to be joined contain the same column.
Left Outer – Use this if the results should always include the records from the left table in the join clause.
Right Outer – Use this if the results should always include the records from the right table in the join clause.
Full Outer – Use this if the results should always include the records from both the right and left tables in the join clause. Full outer joins are only supported for tables from the same relational database. Flat files and the Axion database do not support full outer joins.
In the example below (shown in both source code and graphical views), PERSONID was dragged from the SBYN_PHONE table first. Then the equals operator, located in the Comparison Operators menu, was dragged next to PERSONID. To complete the condition, PERSONID was dragged from the SBYN_PERSON table.
If two source tables are already joined and have columns that are mapped to a target table, you can add another source table to the join by mapping a column in that table to the target table. For example, if source tables S1 and S2 are joined and mapped to target table T1, you can add source table S3 to the join by mapping a column from S3 to T1.
A dialog box appears asking whether you want to add the new table to the join.
The Edit Join View dialog box appears.
Once you create a join between source tables, you can modify the join condition if needed.
Note - For optimal performance, join the most unique columns in the first join and the least unique columns in a second join.
The Create New Join View dialog box appears.
The tables move to the Selected Tables column, and the join is represented graphically in the Preview panel.
Inner – Use this if all tables to be joined contain the same column.
Left Outer – Use this if the results should always include the records from the left table in the join clause.
Right Outer – Use this if the results should always include the records from the right table in the join clause.
Full Outer – Use this if the results should always include the records from both the right and left tables in the join clause. Full outer joins are only supported for tables from the same relational database. Flat files and the Axion database do not support full outer joins.
Note - You can perform this step viewing either the source code or a graphical representation of the source code. For information about available operators, see . The figure below shows a simple example of a join condition.
You can set up collaborations to filter data from source tables using extraction conditions and validations. When the collaboration runs, it will only extract data based on the conditions and validations you define.
The Properties panel appears. By default, the extraction type is configured for conditional extraction. To leave the source data unfiltered, set the Extraction Type property to Full Extraction.
The Extraction Condition dialog box appears.
The Validation Condition dialog box appears.
Once you have defined source and target tables using the Data Integrator Wizard, you can add additional tables as needed. Adding tables is a simple drag and drop procedure.
The procedure below describes how to force an execution strategy for ETL Collaborations. If you are using Java operators, you must select the Pipeline option. For more information about execution strategies, see Execution Strategies.
The Properties panel appears in the right side of the window.
For database ETL collaborations, the design-time test run uses the same URL, catalog, or schema name to connect to the database table as when the collaboration was created.
You can change the database URL to point to a different location or even a different table name as long as the content structure is the same. Restarting the NetBeans IDE reverts the URL back to its original value.
Note - To change DB2 catalog and schema names, modify the table properties by adding user-defined information in the Expert tab.
The Edit Database Properties dialog box appears as shown below.
You can customize the ETL process by defining certain properties for the source tables. Several properties cannot be changed once they have been set. Changes in the Properties sheet are saved with the ETL collaboration.
Note - To change DB2 catalog and schema names, modify the table properties by adding user-defined information in the Expert tab.
The Source Table – Properties panel appears.
|
You can customize the ETL process by defining certain properties for the target tables. Several properties cannot be changed once they have been set. Changes in the Properties sheet are saved with the ETL Collaboration.
Note - To change DB2 catalog and schema names, modify the table properties by adding user-defined information in the Expert tab.
The Target Table – Properties panel appears.
|
You can manage temporary tables by configuring source table properties. When all the source tables in an ETL collaboration are configured with a valid table name for the Staging Table Name property, no create or drop privileges are required for the target environment.
Ensure that the source and staging table structures are the same, including column names and data types. If the staging table structure does not match the corresponding source table, the collaboration will fail with an error message.
On the ETL Collaboration Editor, you can view data contained in source and target tables. You can also view the output data from a join.
The contents of the selected table or the output data for the join appears in the Data Integrator Output panel.
You can view SQL code generated for each table and operator in the ETL collaboration canvas.
The Output section in the lower panel of the NetBeans window displays the generated SQL code, as shown in the following figure.
Oracle Java CAPS Data Integrator provides a constant list of output arguments for all ETL collaborations. Runtime outputs can be captured and displayed or written to a file. These messages are made available automatically by the system.
Runtime output arguments include the following:
Count – Shows the row count for the Insert, Update, or Delete statement.
Status – Shows whether the Insert, Update, or Delete operation was successful or if it failed.
Starttime: Shows the start time of the runtime ETL process.
Endtime: Shows the end time of the ETL process.
The Add Output Runtime Arguments dialog box appears.