Sun Master Data Management Suite Primer

Sun Data Integrator

Extraction, transform, and load (ETL) is a data integration method that extracts data from various data sources, transforms the data into a common format, and then loads the data format into one or more target data sources. ETL processes bring together and combine data from multiple source systems into a data warehouse, enabling all users to work off a single, integrated set of data — a single version of the truth.

The following topics provide information about Sun Data Integrator and its components.

Sun Data Integrator Overview

Sun Data Integrator manages and orchestrates high-volume, high-performance data transformation from within the SOA tier. It is optimized for extracting, transforming, and loading bulk data between files and databases and provides connectivity to a vast range of heterogeneous and diversified data sources, including non-relational data sources. It is optimized for handling very large record sets. The Sun Data Integrator development and runtime environments are fully integrated into Open ESB, NetBeans Enterprise Pack, and Java CAPS.

You can use Sun Data Integrator to for many purposes. You might need to acquire a temporary subset of data for reports or other purposes, or you might need to acquire a more permanent data set in order to populate a data warehouse. You can also use Sun Data Integrator for database type conversions or to migrate data from one database or platform to another.

Sun Data Integrator applies the following ETL methodology:

Sun Data Integrator Features

Sun Data Integrator provides your business with a powerful assortment of design-time features that you can use to create and configure ETL processes. The runtime features allow you to monitor the ETL processes and to review any data errors.

Sun Data Integrator provides the following features:

Sun Data Integrator Architecture

The Sun Data Integrator design-time components allow you to specify the data source and target databases, map source fields and columns to target fields and columns, define custom processing, and test and validate the ETL collaboration. Design-time components include the NetBeans project system, a wizard to guide you through creating and configuring an ETL process, and a mapping editor where you can map source and target data and customize the transformation.

The runtime components include monitors to view the status of ETL processes and any rejected data. The Data Integrator Engine execute the ETL process. The following diagram shows the Sun Data Integrator components and their relationship to one another. Data Integrator clients could include technologies such as web services, Java EE or .Net applications, reporting tools, or MDM applications, such as Sun Master Index.

Figure 7 Data Integrator Architecture

Figure shows the components of Data Integrator and how
the work together.

Sun Data Integrator Development Phase

The development phase consists of standard tasks for specifying source and target databases and advanced tasks for further customizing the data transformation logic.

Standard Development Tasks

The following steps outline the basic procedure for developing an ETL process using Sun Data Integrator.

  1. Connect to the source and target databases from the Services window in NetBeans.

  2. Create a new Data Integrator Module project in NetBeans.

  3. Using the Data Integrator Wizard, specify the source database and tables and the target database and tables.

  4. Using either the Data Integrator Wizard or the ETL Collaboration Editor, specify join conditions and map the source fields or columns to the target fields or columns.

  5. Specify the execution strategy.

  6. Add the ETL service to a composite application.

  7. Build and deploy the composite application.

Advanced Development Tasks

You can perform additional tasks during the development phase to customize your ETL application further.

Data Integrator Wizard

The Data Integrator Wizard takes you through each step of the ETL setup process and, based on the information you specify, creates a collaboration that defines the configuration of the ETL process.

Figure 8 Select Target Table on the Data Integrator Wizard

Figure shows a step in the Data Integrator Wizard.

ETL Collaboration Editor

Once you define the data integration framework using the wizard, you use the ETL Collaboration Editor to further customize its configuration.

Figure 9 ETL Collaboration Editor

Figures shows a sample mapping on the ETL Collaboration
Editor.

Sun Data Integrator Runtime Phase

Once all of the development tasks are complete and the system is running, you can perform any of these maintenance tasks.

Monitoring and Maintenance

You can monitor ETL collaborations using the ETL Monitor, which is deployed on the application server Admin Console. The monitor allows you to specify a date range of events to monitor and also provides a purge function so you can remove outdated or obsolete events. For each event, the monitor displays the target table, start and end dates, the number of records extracted and loaded, the number of rejected records, and any exception messages. You can also view a summary, and drill down into the details of rejected records.

Figure 10 ETL Monitor on the Admin Console

Figure shows the ETL Monitor on the Admin Console.