JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Java CAPS Data Integrator User's Guide     Java CAPS Documentation
search filter icon
search icon

Document Information

Designing Data Integrator Projects

About Data Integrator

Extracting, Transforming, Loading: ETL

Oracle Java CAPS Data Integrator Overview

Extracting, Transforming, and Loading: ETL

Oracle Java CAPS Data Integrator Methodology

Oracle Java CAPS Data Integrator Features

Oracle Java CAPS Data Integrator Architecture

Oracle Java CAPS Data Integrator Design-Time Components

Data Integrator Editor

Oracle Java CAPS Data Integrator Project System

Data Integrator Service Engine

ETL Engine

ETL Service Engine

Data Integrator Monitor

Data Integrator Recovery

Creating Oracle Java CAPS Data Integrator Projects

Connecting to Source and Target Databases

Connecting to a JDBC-Compliant Database

Creating and Connecting to Data Mashup Services

Virtual Database Table Metadata Options

Virtual Database Column Properties

Creating a New Data Integrator Project

To Create a New Project

Creating an ETL Collaboration Using the Wizard

Creating a Basic ETL Collaboration

To Create a Basic ETL Collaboration

Creating an Advanced ETL Collaboration

To Create an Advanced ETL Collaboration

Creating an ETL Collaboration for a Master Index Staging Database

To Create an ETL Collaboration for a Master Index Staging Database

Creating a Bulk Loader ETL Collaboration

To Create a Bulk Loader ETL Collaboration

ETL Collaboration Overview

Execution Strategies

Direct/Simple Execution Strategy

One Pass Execution Strategy

Staging Execution Strategy

Pipeline Execution Strategy

Whitespace Considerations

Explicit and Implicit Joins

Runtime Properties

Data Validation Conditions

About the ETL Collaboration Editor

Configuring ETL Collaborations

Joining Source Tables

To Join Source Tables

To Join Source Tables During Mapping

Modifying an Existing Join

To Join Source Tables

Defining Extraction Conditions and Validations

To Define Extraction Conditions and Validation.

Adding Tables to an Existing Collaboration

To Add Tables to a Collaboration

Forcing Execution Strategies for Collaborations

To Force Execution Strategies for Collaborations

Changing the Database URL for Design Time

To Change the Database URL for Design Time

Configuring Source Table Properties

To Configure Source Table Properties

Configuring Target Table Properties

To Configure Target Table Properties

Using Pre-Created Temporary Staging Tables

Using Temporary Staging Tables

Viewing Table or Join Data

To View Table or Join Data

Viewing the SQL Code

To View SQL Code

Viewing Runtime Output Arguments

To View Runtime Output Arguments

Fine-Tuning the ETL Process

Filtering Source Data Using Runtime Inputs

To Filter Source Data Using Runtime Inputs

Setting the Batch Size for Joined Tables

To Set the Batch Size for Joined Tables

Using Table Aliases with Multiple Source Table Views

Grouping Input Data

To Group Input Data

Viewing and Modifying Table Data

To View and Modify Table Data

Oracle Java CAPS Data Integrator Architecture

Oracle Java CAPS Data Integrator has three primary components:

Figure 1 Data Integrator Architecture

image:Figure shows the different Data Integrator components and how they relate to one another.

Oracle Java CAPS Data Integrator Design-Time Components

The primary components of the Data Integrator design-time are the ETL Collaboration Editor and the project system.

Data Integrator Editor

The Data Integrator Editor allows you to configure your ETL processes by modifying the source code or by using a graphical editor. It has many predefined data transformation, validation, and cleansing functions, and also allows you to add user-defined functions. This editor is a design-time component that you use to design the ETL collaborations and to create the artifacts that can be deployed as a Data Integrator Service Engine.

The Data Integrator editor contains various modules and functions embedded in it, including the following:

Oracle Java CAPS Data Integrator Project System

The project system acts as a container for holding the ETL files and provides ant-based build support. Building the project creates two types of artifacts: those related to the Service Engine and those related to the ETL Engine. For building the ETL Engine artifacts, the project system delegates the responsibility from the ETL file to the code generation module of the Data Integrator Editor. The project system builds the Service Engine artifacts on its own. Service Engine artifacts are the files etlmap.xml and jbi.xml. The jbi.xml file contains information about the provisioning and consuming endpoint related to the service unit. The etlmap.xml contains the map of the endpoint name and the engine file to be used for the particular endpoint. When an ETL service endpoint gets a request , the ETL Service Engine picks up the correct engine file using etlmap.xml and invokes the ETL Engine with this file.

Data Integrator Service Engine

The Data Integrator Service Engine is an implementation of a Java Business Integration (JBI) service engine and is compliant with JSR 208. When the service engine is deployed to a JBI container, the service unit (SU) JAR file that is produced by a Data Integrator project is consumed by the Data Integrator Service Engine.

The ability of the Oracle Java CAPS Data Integrator Service Engine to expose ETL operations as web services makes the tool suitable for business integration applications based on a Service Oriented Architecture (SOA). This engine is specially designed to work with high volume data with high performance. TheData Integrator Service Engine package is an embedded database engine and has the ability to execute SQL on non-database data sources.

The Data Integrator Service Engine includes the ETL Engine and the ETL Service Engine.

ETL Engine

The ETL Engine is responsible for executing the ETL operations that were designed using the ETL Collaboration Editor or Data Integrator Wizard. The ETL Engine parses the engine file, substitutes all SQL scripting with the runtime parameters if any, and then starts the execution. SQL scripts generated during the design time can be parameterized and can be substituted in the runtime. The ETL task manager creates a thread for each task defined using the ETL task thread. The task manager waits for dependent tasks and maintains the work flow that was specified in the engine file. The ETL Engine supports batch processing and uses prepared statements to provide better performance.

ETL Service Engine

The ETL Service Engine is an optional component. This component exposes the ETL operations as web services and also handles the service requests and responses. This component is installed separately.

Data Integrator Monitor

The Data Integrator Monitor is a web application that you can use to monitor the progress and statistics of your ETL collaborations. When the ETL Engine executes the engine file, a task is defined for updating the statistics. The ETL Engine creates an Axion database table for keeping track of the collaboration statistics and updates it to track the progress of the ETL operation. The Axion table is queried by the ETL Monitor and the results are displayed in the web console.

On the Data Integrator Monitor, you can view detailed information about each record and about rejected records. You can also view a summary of the process. The monitor also provides the ability to purge obsolete messages.