ETL stands for Extract, Transform, and Load. ETL programs periodically extract data from source systems, transforms the data into common format, and then loads the data into the target data store or warehouse. ETL processes bring together and combine data from multiple source systems into a data warehouse or other target database, enabling all users to work off a single, integrated set of data.
Extract – The process of reading data from specified source database and extracting a desired subset of data.
Transform – The process of transforming the data into the required form so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining with other data.
Load: The process of writing or loading the data into the target database.
Extraction, Transform, and Load (ETL) is a data integration methodology that extracts data from data sources, transforms and cleanses the data, then loads the data in a uniform format into one or more target data sources.
Data Integrator provides high-volume extraction and loading of tabular data sets for Java CAPS, NetBeans, or OpenESB, projects, or as a standalone product. You can use Data Integrator to acquire a temporary subset of data for reports or other purposes, or acquire a more permanent data set for the population of a data mart or data warehouse. You can also use ETL for database type conversions or to migrate data from one database or platform to another.
Data Integrator applies the following ETL methodology:
Extraction: The input data is extracted from data sources. Using Data Integrator, the data can be filtered and joined from multiple, heterogeneous sources, which results in a desired subset of data suitable for transformation.
Transformation: Data Integrator applies the operators specified for the process to transform and cleanse the data to the desired state. Sun Data Integrator supports normalization and parsing of certain data.
Load: The transformed data is loaded into one or multiple databases or data warehouses.
The following are the list of features for Sun Data Integrator:
Requires little database expertise to build high performing ETL processes.
Metadata auto discovery enables user to design ETL processes faster.
Takes advantage of database bulk, no-logging tuning where applicable for faster data warehouse loads.
Support for creating automatic joins based Primary Key and Foreign Key relationships, and creates code to ensure data integrity.
Takes advantage of database engine by pushing as much of the workload on to the target and source database.
Supports extensive non-relational data formats
Transforms, filters, and sorts at the source where appropriate.
Supports data cleansing operators to ensure data quality. Provides a dictionary driven system for complete parsing of names and addresses of individuals and organizations, products, and locations. Supports data normalization and de-normalization.
Converts data into a consistent, standardized form to enable loading to a conformed target databases.
Supports built–in data integrity checks.
Supports data type conversion, null value handling, and customized transformation.
Provides a robust error handler to ensure data quality and a comprehensive system for reporting and responding to all error events.
Supports change management functions or versioning.
Allows concurrent or parallel processing of multiple source data streams.
Supports full refresh and incremental extraction.
Is fully integrated with NetBeans to provide a complete development environment.
Supports Data Federation, enabling you to use SQL to define ETL processes.
Provides near real-time click-stream data warehousing (in conjunction with the JDBC Binding Component).
Supports Enterprise Resource Project and Customer Relation Manager data sources in conjunction with various components from Java CAPS.
Provide platform independence and scalability to enterprise data warehousing applications.
Allows you to define complex transformations using built-in transformation objects.
Allows you to schedule ETL sessions based on time or on the occurrence of a specified event (in conjunction with Java CAPS components).
Can participate as a partner in BPEL business processes. Sun Data Integrator exposes the ETL process as web service.
Can extract data from outside a firewall in conjunction with FTP and HTTP Connectors.
Provides reporting and analysis of transformations that failed or were rejected, and then allows you to resubmit them after correcting the data.
Provides extensive reporting of the results of ETL sessions, including automatic notification of significant failures of the process.