Skip Navigation Links | |
Exit Print View | |
Oracle Java CAPS Data Integrator User's Guide Java CAPS Documentation |
Designing Data Integrator Projects
Extracting, Transforming, Loading: ETL
Oracle Java CAPS Data Integrator Overview
Extracting, Transforming, and Loading: ETL
Oracle Java CAPS Data Integrator Architecture
Oracle Java CAPS Data Integrator Design-Time Components
Oracle Java CAPS Data Integrator Project System
Data Integrator Service Engine
Creating Oracle Java CAPS Data Integrator Projects
Connecting to Source and Target Databases
Connecting to a JDBC-Compliant Database
Creating and Connecting to Data Mashup Services
Virtual Database Table Metadata Options
Virtual Database Column Properties
Creating a New Data Integrator Project
Creating an ETL Collaboration Using the Wizard
Creating a Basic ETL Collaboration
To Create a Basic ETL Collaboration
Creating an Advanced ETL Collaboration
To Create an Advanced ETL Collaboration
Creating an ETL Collaboration for a Master Index Staging Database
To Create an ETL Collaboration for a Master Index Staging Database
Creating a Bulk Loader ETL Collaboration
To Create a Bulk Loader ETL Collaboration
Direct/Simple Execution Strategy
About the ETL Collaboration Editor
Configuring ETL Collaborations
To Join Source Tables During Mapping
Defining Extraction Conditions and Validations
To Define Extraction Conditions and Validation.
Adding Tables to an Existing Collaboration
To Add Tables to a Collaboration
Forcing Execution Strategies for Collaborations
To Force Execution Strategies for Collaborations
Changing the Database URL for Design Time
To Change the Database URL for Design Time
Configuring Source Table Properties
To Configure Source Table Properties
Configuring Target Table Properties
To Configure Target Table Properties
Using Pre-Created Temporary Staging Tables
Using Temporary Staging Tables
Viewing Runtime Output Arguments
To View Runtime Output Arguments
Filtering Source Data Using Runtime Inputs
To Filter Source Data Using Runtime Inputs
Setting the Batch Size for Joined Tables
To Set the Batch Size for Joined Tables
Using Table Aliases with Multiple Source Table Views
ETL stands for Extract, Transform, and Load. ETL programs periodically extract data from source systems, transforms the data into common format, and then loads the data into the target data store or warehouse. ETL processes bring together and combine data from multiple source systems into a data warehouse or other target database, enabling all users to work off a single, integrated set of data.
Extract – The process of reading data from specified source database and extracting a desired subset of data.
Transform – The process of transforming the data into the required form so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining with other data.
Load: The process of writing or loading the data into the target database.
Extraction, Transform, and Load (ETL) is a data integration methodology that extracts data from data sources, transforms and cleanses the data, then loads the data in a uniform format into one or more target data sources.
Data Integrator provides high-volume extraction and loading of tabular data sets for Java CAPS, NetBeans, or OpenESB, projects, or as a standalone product. You can use Data Integrator to acquire a temporary subset of data for reports or other purposes, or acquire a more permanent data set for the population of a data mart or data warehouse. You can also use ETL for database type conversions or to migrate data from one database or platform to another.
Data Integrator applies the following ETL methodology:
Extraction: The input data is extracted from data sources. Using Data Integrator, the data can be filtered and joined from multiple, heterogeneous sources, which results in a desired subset of data suitable for transformation.
Transformation: Data Integrator applies the operators specified for the process to transform and cleanse the data to the desired state. Oracle Java CAPS Data Integrator supports normalization and parsing of certain data.
Load: The transformed data is loaded into one or multiple databases or data warehouses.
The following are the list of features for Oracle Java CAPS Data Integrator:
Requires little database expertise to build high performing ETL processes.
Metadata auto discovery enables user to design ETL processes faster.
Takes advantage of database bulk, no-logging tuning where applicable for faster data warehouse loads.
Support for creating automatic joins based Primary Key and Foreign Key relationships, and creates code to ensure data integrity.
Takes advantage of database engine by pushing as much of the workload on to the target and source database.
Supports extensive non-relational data formats
Transforms, filters, and sorts at the source where appropriate.
Supports data cleansing operators to ensure data quality. Provides a dictionary driven system for complete parsing of names and addresses of individuals and organizations, products, and locations. Supports data normalization and de-normalization.
Converts data into a consistent, standardized form to enable loading to a conformed target databases.
Supports built–in data integrity checks.
Supports data type conversion, null value handling, and customized transformation.
Provides a robust error handler to ensure data quality and a comprehensive system for reporting and responding to all error events.
Supports change management functions or versioning.
Allows concurrent or parallel processing of multiple source data streams.
Supports full refresh and incremental extraction.
Is fully integrated with NetBeans to provide a complete development environment.
Supports Data Federation, enabling you to use SQL to define ETL processes.
Provides near real-time click-stream data warehousing (in conjunction with the JDBC Binding Component).
Supports Enterprise Resource Project and Customer Relation Manager data sources in conjunction with various components from Java CAPS.
Provide platform independence and scalability to enterprise data warehousing applications.
Allows you to define complex transformations using built-in transformation objects.
Allows you to schedule ETL sessions based on time or on the occurrence of a specified event (in conjunction with Java CAPS components).
Can participate as a partner in BPEL business processes. Oracle Java CAPS Data Integrator exposes the ETL process as web service.
Can extract data from outside a firewall in conjunction with FTP and HTTP Connectors.
Provides reporting and analysis of transformations that failed or were rejected, and then allows you to resubmit them after correcting the data.
Provides extensive reporting of the results of ETL sessions, including automatic notification of significant failures of the process.