Skip Navigation Links | |
Exit Print View | |
Oracle Java CAPS Data Integrator User's Guide Java CAPS Documentation |
Designing Data Integrator Projects
Extracting, Transforming, Loading: ETL
Oracle Java CAPS Data Integrator Overview
Extracting, Transforming, and Loading: ETL
Oracle Java CAPS Data Integrator Methodology
Oracle Java CAPS Data Integrator Features
Oracle Java CAPS Data Integrator Architecture
Oracle Java CAPS Data Integrator Design-Time Components
Oracle Java CAPS Data Integrator Project System
Creating Oracle Java CAPS Data Integrator Projects
Connecting to Source and Target Databases
Connecting to a JDBC-Compliant Database
Creating and Connecting to Data Mashup Services
Virtual Database Table Metadata Options
Virtual Database Column Properties
Creating a New Data Integrator Project
Creating an ETL Collaboration Using the Wizard
Creating a Basic ETL Collaboration
To Create a Basic ETL Collaboration
Creating an Advanced ETL Collaboration
To Create an Advanced ETL Collaboration
Creating an ETL Collaboration for a Master Index Staging Database
To Create an ETL Collaboration for a Master Index Staging Database
Creating a Bulk Loader ETL Collaboration
To Create a Bulk Loader ETL Collaboration
Direct/Simple Execution Strategy
About the ETL Collaboration Editor
Configuring ETL Collaborations
To Join Source Tables During Mapping
Defining Extraction Conditions and Validations
To Define Extraction Conditions and Validation.
Adding Tables to an Existing Collaboration
To Add Tables to a Collaboration
Forcing Execution Strategies for Collaborations
To Force Execution Strategies for Collaborations
Changing the Database URL for Design Time
To Change the Database URL for Design Time
Configuring Source Table Properties
To Configure Source Table Properties
Configuring Target Table Properties
To Configure Target Table Properties
Using Pre-Created Temporary Staging Tables
Using Temporary Staging Tables
Viewing Runtime Output Arguments
To View Runtime Output Arguments
Filtering Source Data Using Runtime Inputs
To Filter Source Data Using Runtime Inputs
Setting the Batch Size for Joined Tables
To Set the Batch Size for Joined Tables
Using Table Aliases with Multiple Source Table Views
Oracle Java CAPS Data Integrator has three primary components:
Figure 1 Data Integrator Architecture
The primary components of the Data Integrator design-time are the ETL Collaboration Editor and the project system.
The Data Integrator Editor allows you to configure your ETL processes by modifying the source code or by using a graphical editor. It has many predefined data transformation, validation, and cleansing functions, and also allows you to add user-defined functions. This editor is a design-time component that you use to design the ETL collaborations and to create the artifacts that can be deployed as a Data Integrator Service Engine.
The Data Integrator editor contains various modules and functions embedded in it, including the following:
Model (SQL Framework)
View (JGo Graph Library view)
Controller
Wizards (New ETL File wizard and Mashup wizard)
Database Evaluators
Code Generator
The project system acts as a container for holding the ETL files and provides ant-based build support. Building the project creates two types of artifacts: those related to the Service Engine and those related to the ETL Engine. For building the ETL Engine artifacts, the project system delegates the responsibility from the ETL file to the code generation module of the Data Integrator Editor. The project system builds the Service Engine artifacts on its own. Service Engine artifacts are the files etlmap.xml and jbi.xml. The jbi.xml file contains information about the provisioning and consuming endpoint related to the service unit. The etlmap.xml contains the map of the endpoint name and the engine file to be used for the particular endpoint. When an ETL service endpoint gets a request , the ETL Service Engine picks up the correct engine file using etlmap.xml and invokes the ETL Engine with this file.
The Data Integrator Service Engine is an implementation of a Java Business Integration (JBI) service engine and is compliant with JSR 208. When the service engine is deployed to a JBI container, the service unit (SU) JAR file that is produced by a Data Integrator project is consumed by the Data Integrator Service Engine.
The ability of the Oracle Java CAPS Data Integrator Service Engine to expose ETL operations as web services makes the tool suitable for business integration applications based on a Service Oriented Architecture (SOA). This engine is specially designed to work with high volume data with high performance. TheData Integrator Service Engine package is an embedded database engine and has the ability to execute SQL on non-database data sources.
The Data Integrator Service Engine includes the ETL Engine and the ETL Service Engine.
The ETL Engine is responsible for executing the ETL operations that were designed using the ETL Collaboration Editor or Data Integrator Wizard. The ETL Engine parses the engine file, substitutes all SQL scripting with the runtime parameters if any, and then starts the execution. SQL scripts generated during the design time can be parameterized and can be substituted in the runtime. The ETL task manager creates a thread for each task defined using the ETL task thread. The task manager waits for dependent tasks and maintains the work flow that was specified in the engine file. The ETL Engine supports batch processing and uses prepared statements to provide better performance.
The ETL Service Engine is an optional component. This component exposes the ETL operations as web services and also handles the service requests and responses. This component is installed separately.
The Data Integrator Monitor is a web application that you can use to monitor the progress and statistics of your ETL collaborations. When the ETL Engine executes the engine file, a task is defined for updating the statistics. The ETL Engine creates an Axion database table for keeping track of the collaboration statistics and updates it to track the progress of the ETL operation. The Axion table is queried by the ETL Monitor and the results are displayed in the web console.
On the Data Integrator Monitor, you can view detailed information about each record and about rejected records. You can also view a summary of the process. The monitor also provides the ability to purge obsolete messages.