Skip Navigation Links | |
Exit Print View | |
Oracle Java CAPS Data Integrator User's Guide Java CAPS Documentation |
Designing Data Integrator Projects
Extracting, Transforming, Loading: ETL
Oracle Java CAPS Data Integrator Overview
Extracting, Transforming, and Loading: ETL
Oracle Java CAPS Data Integrator Methodology
Oracle Java CAPS Data Integrator Features
Oracle Java CAPS Data Integrator Architecture
Oracle Java CAPS Data Integrator Design-Time Components
Oracle Java CAPS Data Integrator Project System
Data Integrator Service Engine
Creating Oracle Java CAPS Data Integrator Projects
Connecting to Source and Target Databases
Connecting to a JDBC-Compliant Database
Creating and Connecting to Data Mashup Services
Virtual Database Table Metadata Options
Virtual Database Column Properties
Creating a New Data Integrator Project
Creating an ETL Collaboration Using the Wizard
Creating a Basic ETL Collaboration
To Create a Basic ETL Collaboration
Creating an Advanced ETL Collaboration
To Create an Advanced ETL Collaboration
Creating an ETL Collaboration for a Master Index Staging Database
To Create an ETL Collaboration for a Master Index Staging Database
Direct/Simple Execution Strategy
About the ETL Collaboration Editor
Configuring ETL Collaborations
To Join Source Tables During Mapping
Defining Extraction Conditions and Validations
To Define Extraction Conditions and Validation.
Adding Tables to an Existing Collaboration
To Add Tables to a Collaboration
Forcing Execution Strategies for Collaborations
To Force Execution Strategies for Collaborations
Changing the Database URL for Design Time
To Change the Database URL for Design Time
Configuring Source Table Properties
To Configure Source Table Properties
Configuring Target Table Properties
To Configure Target Table Properties
Using Pre-Created Temporary Staging Tables
Using Temporary Staging Tables
Viewing Runtime Output Arguments
To View Runtime Output Arguments
Filtering Source Data Using Runtime Inputs
To Filter Source Data Using Runtime Inputs
Setting the Batch Size for Joined Tables
To Set the Batch Size for Joined Tables
Using Table Aliases with Multiple Source Table Views
The following tasks describe how to create and add components to a Oracle Java CAPS Data Integrator project using the Data Integrator Wizard.
Before you can select databases and database tables to extract data from and load data to, you need to create and connect to the databases to use. Oracle Java CAPS Data Integrator supports JDBC-compliant databases, flat files, and data mashup services. You only need to define the connections for relational databases. If you are using flat files as your source, you do not need to create or connect to a database. The wizard provides the ability to connect to multiple source files.
This step requires that the database drivers for the database platforms you are working with are installed. Some database drivers are already installed by default, but you might need to add the database driver depending on which database platform you are using. For example, if you are using Oracle or Microsoft SQL Server, you need to copy the driver to the application server and add it to the Services window.
Before You Begin
Make sure the database you are connecting to has already been created and is running. If the database drivers for the platforms you are using have not been installed to app_server/lib, copy the drivers to that location.
The new database driver appears under Drivers in the Services window.
Note - Different database platforms use different connection URLs. Refer to your database documentation for the format to use.
A list of available schemas appears for you to choose from.
A new database connection appears under Databases.
If you want to use more than one data source for the data integration process, you can create a Data Mashup Service. Data mashup allows you to select multiple data sources of varying types and combine them into one target database. Source data can reside in files on your network or on the web in HTML, RSS, or Web Row Set format.
Note - New in Java CAPS Release 6 Update 1, you can specify multiple data sources using the Data Integrator Wizard. If you are using multiple source files, you can either create the mashup here or you can use the Data Integrator Wizard to specify the sources directly.
The Add External Tables Wizard appears.
The Choose Data Source window appears.
Tip - If you add a data source in error, highlight it in the table and then click Remove.
The Enter Tables Details window appears.
Depending on the type of file you selected, the Choose a Sheet, Choose a (HTML) Table, or Import Table MetaData window appears.
The Import Table MetaData window appears.
The available options on this window vary depending on the type of data source. For more information about the properties you can modify, see Virtual Database Table Metadata Options.
The Enter Column Properties window appears.
Tip - For more information about column properties, see Virtual Database Column Properties. If your data source does not contain field names, you should customize the column names for clarity.
When you add external tables to a virtual database, you can configure the metadata for each data source. All metadata properties are listed below, but some might not be available depending on the type of data source you are adding.
|
When you add external tables to a virtual database, you can configure the column attributes for each data source. All column properties are listed below.
|
Before you can begin to define and configure the components and ETL processes to use for your data integration you need to create a new project of the type Data Integrator Module.
The New Project Wizard appears.
The new project appears in the Projects window.
Next Steps
Create an ETL Collaboration following the instructions provided under Creating an ETL Collaboration Using the Wizard. You can also create an ETL Collaboration from scratch, but the wizard provides a quick and easy way to generate most of the collaboration code.
Note - The Data Integrator Wizard was enhanced in Java CAPS 6 Update 1. The instructions in this topic might differ from what is available in Release 6.
You can use the Data Integrator Wizard to create as much or as little of the ETL collaboration as you want. You can exit the wizard at any time once the basic framework is defined. After you complete the wizard, you can open the collaboration for further configuration.
The wizard provides three options for the collaboration:
Basic Extract – Allows you to generate an ETL Collaboration that extracts, transforms, and loads data between JDBC, virtual (mashup), and flat-file databases. To create a basic collaboration, follow the instructions under Creating a Basic ETL Collaboration.
Advanced Extract – Allows you to generate an ETL Collaboration that extracts, transforms, and loads data between data sources and targets when there are multiple sources of different types. This option can also be used for creating an MDM staging database using an Oracle Java CAPS Master Index schema to generate the database tables. The staging database can then be used by the Data Cleanser, Data Profiler, and Initial Bulk Match and Load tool for a Master Index. To create an advanced collaboration or a staging database for a master index application, follow the instructions under Creating an Advanced ETL Collaboration. To create a staging database for a master index application, follow the instructions under Creating an ETL Collaboration for a Master Index Staging Database.
Bulk Loader – Allows you to generate an ETL Collaboration that loads delimited data in a flat file that is structurally identical to a JDBC-compliant target database. This is specifically designed to load the data images produced by the Bulk Matcher into a master index database. To create a bulk loader collaboration for a master index application, follow the instructions under Creating a Bulk Loader ETL Collaboration.
Note - The Data Integrator Wizard was enhanced in Java CAPS 6 Update 1. The instructions in this topic might differ from what is available in Release 6.
A basic collaboration allows you to transfer data from a single data source to a data target. If you have multiple sources, you can create a virtual database before creating the basic collaboration. See Creating and Connecting to Data Mashup Services for more information. You could also use the Advanced option of the wizard instead.
You can click Finish at any time during the wizard to generate a collaboration with the information you specified to that point. Then you can complete the configuration using the ETL Collaboration Editor.
Before You Begin
Complete the following tasks:
Connecting to Source and Target Databases (if your source or target data is stored in a relational or virtual database)
The New File Wizard appears with the Name and Location window displayed.
The Select Source Tables window appears.
Data Integrator automatically selects a schema based on the login information. You only need to change this field if you are using a different schema.
Tip - You can use the Shift and Control keys to select multiple tables at once. If you add a table in error, select the table in the lower portion of the window and click Remove.
The Select Source Tables for Join window appears.
Inner – Use this if all tables to be joined contain the same column.
Left Outer – Use this if the results should always include the records from the left table in the join clause.
Right Outer – Use this if the results should always include the records from the right table in the join clause.
Full Outer – Use this if the results should always include the records from both the right and left tables in the join clause. Full outer joins are only supported for tables from the same relational database. Flat files and the Axion database do not support full outer joins.
The Select Target Tables window appears.
Tip - You can use the Shift and Control keys to select multiple tables at once. If you add a table in error, select the table in the lower portion of the window and click Remove.
The new ETL collaboration appears in the Projects window, and the Collaboration Editor opens with the source tables displayed on the left and target tables displayed on the right.
Next Steps
You can further configure the ETL collaboration using the ETL Collaboration Editor. For more information, see Configuring ETL Collaborations.
Note - The Data Integrator Wizard was enhanced in Java CAPS 6 Update 1. The instructions in this topic might differ from what is available in Release 6.
An advanced collaboration allows you to transfer data from multiple types of data sources to a data target. This procedure describes how to create an advanced collaboration using the automated wizard. Depending on the type of data source and the options you use, the wizard skips certain unnecessary steps. This option might product multiple ETL collaborations depending on the number of target tables.
You can click Finish at any time during the wizard to generate a collaboration with the information you specified to that point. Then you can complete the configuration using the ETL Collaboration Editor.
Before You Begin
Complete the following tasks:
Connecting to Source and Target Databases (if your source or target data is stored in a relational or virtual database)
The New File Wizard appears with the Name and Location window displayed.
The Select or Create Database window appears.
Note - This database is required and is used for internal processing only.
The Choose Data Source window appears.
The Enter Table Details window appears, with the information for the first data file displayed.
Data Integrator automatically fills in these fields based on the information from the previous window, so the existing values should be correct.
If the data file is a spreadsheet, the Choose a Sheet window appears; otherwise, the Import Table MetaData window appears.
Tip - To view the contents of a sheet, click the Preview button.
Data Integrator automatically fills in this information, but you might need to customize it. For more information about the properties you can configure, see Virtual Database Table Metadata Options.
The Enter Column Properties window appears.
For more information about these properties, see Virtual Database Column Properties.
The Select JDBC Source Tables window appears.
If there are tables to join, the Select Source Tables for Join window appears; otherwise, the Generate Target Database window appears.
Inner – Use this if all tables to be joined contain the same column.
Left Outer – Use this if the results should always include the records from the left table in the join clause.
Right Outer – Use this if the results should always include the records from the right table in the join clause.
Full Outer – Use this if the results should always include the records from both the right and left tables in the join clause. Full outer joins are only supported for tables from the same relational database. Flat files and the Axion database do not support full outer joins.
The Generate Target Database Master Index Model window appears. Using this page is described in a separate topic, Creating an ETL Collaboration for a Master Index Staging Database.
The Select JDBC Target Tables window appears.
Tip - You can use the Shift and Control keys to select multiple tables at once. If you add a table in error, select the table in the lower portion of the window and click Remove.
The Map Selected Collaboration Tables window appears.
Note - If you do not specify a mapping here, the source tables do not appear in the ETL collaboration. You can add the source tables directly to the collaboration using the Select Source and Target Tables function. To select multiple source tables for one target, hold down the Control key while you select the required source tables. If you select multiple source tables for one target, the source tables are automatically joined.
The new ETL collaboration appears in the Projects window. If multiple collaboration are created, they are given the name you specified for the collaboration with a target table name appended.
Next Steps
You can further configure the ETL collaboration using the ETL Collaboration Editor. For more information, see Configuring ETL Collaborations.
Note - The Data Integrator Wizard was enhanced in Java CAPS 6 Update 1. The instructions in this topic might differ from what is available in Release 6.
The Data Integrator Wizard helps you create and populate a staging database that stores the legacy data to be loaded into a master index database so you can cleanse and load the data in bulk. Data Integrator generates the staging database based on the object structure defined for the master index, so the data is automatically presented in a format that the Data Cleanser, Data Profiler, and Bulk Matcher can read. This procedure describes how to create the staging database using the automated wizard. Depending on the type of data source and the options you use, the wizard skips certain unnecessary steps.
You can click Finish at any time during the wizard to generate a collaboration with the information you specified to that point. Then you can complete the configuration using the ETL Collaboration Editor.
Before You Begin
Complete the following tasks:
Connecting to Source and Target Databases (if your source or target data is stored in a relational or virtual database)
The New File Wizard appears with the Name and Location window displayed.
The Select or Create Database window appears.
Note - This database is required and is used for internal processing only.
The Choose Data Source window appears.
The Enter Table Details window appears, with the information for the first data file displayed.
Data Integrator automatically fills in these fields based on the information from the previous window, so the existing values should be correct.
If the data file is a spreadsheet, the Choose a Sheet window appears; otherwise, the Import Table MetaData window appears.
Tip - To view the data in a sheet, click the Preview button.
Data Integrator automatically fills in this information, but you might need to customize it. For more information about the properties you can configure, see Virtual Database Table Metadata Options.
The Enter Column Properties window appears.
For more information about these properties, see Virtual Database Column Properties.
The Select Source Tables window appears.
If there are tables to join, the Select Source Tables for Join window appears; otherwise, the Generate Target Database window appears.
Inner – Use this if all tables to be joined contain the same column.
Left Outer – Use this if the results should always include the records from the left table in the join clause.
Right Outer – Use this if the results should always include the records from the right table in the join clause.
Full Outer – Use this if the results should always include the records from both the right and left tables in the join clause. Full outer joins are only supported for tables from the same relational database. Flat files and the Axion database do not support full outer joins.
The Generate Target Database Master Index Model window appears.
Note - This file is located in NetBeansProjects_Home/Project_Name/src/Configuration.
The Select JDBC Target Tables window appears. The target tables to load the extracted data into are already listed under Available Connections. It is not recommended you change these.
The Map Selected Collaboration Tables window appears.
Note - If you do not specify a mapping here, the source tables do not appear in the ETL collaboration. You can add the source tables directly to the collaboration using the Select Source and Target Tables function. To select multiple source tables for one target, hold down the Control key while you select the required source tables. If you select multiple source tables for one target, the source tables are automatically joined.
The new ETL collaboration appears in the Projects window. If multiple collaboration are created, they are given the name you specified for the collaboration with a target table name appended. To load the data into the staging database, run each of the collaborations. Make sure you are connected to both databases first.
Next Steps
You can further configure the ETL collaboration using the ETL Collaboration Editor. For more information, see Configuring ETL Collaborations.
Note - The Data Integrator Wizard was enhanced in Java CAPS 6 Update 1. The instructions in this topic might differ from what is available in Release 6.
You can use the Data Integrator Wizard to generate the Bulk Loader for a master index application. The Bulk Loader loads data that has already been cleansed, standardized, and matched into a master index database. The source files for the Bulk Loader are those generated by the Bulk Matcher.
Before You Begin
Complete the steps under Creating a New Data Integrator Project.
Make sure the master index database is running, and that your NetBeans IDE is connected to the master index database.
In order to specify the source files for the Bulk Loader, you need to run the Bulk Matcher first. For more information see, Loading the Initial Data Set for a Master Index.
The New File Wizard appears with the Name and Location window displayed.
The Select or Create Database window appears.
Note - This database is required and is used for internal processing only.
The Select JDBC Target Tables window appears.
Tip - You can use the Shift and Control keys to select multiple tables at once. If you select target tables that do not correspond to the Bulk Matcher files, collaborations without source table are generated and the project fails to build.
The Choose Bulk Loader Data Source window appears.
Note - These files are located in NetBeansProjects_Home/Project_Name/loader-generated/loader/work/masterindex, where work is the location you specified for the working directory in loader-config.xml.
The Map Selected Collaboration Tables window appears.
Note - Not every table on the left will be mapped. For example, system tables such as SBYN_COMMON_HEADER, SBYN_COMMON_DETAIL, SBYN_APPL, and SBYN_SYSTEMS do not need to be mapped.
An ETL collaboration is created for each target table. This might take a few minutes to generate.
Next Steps
You can further configure the ETL collaboration in the ETL Collaboration Editor. For more information, see Configuring ETL Collaborations.
To load the data into the master index database, you can either run each collaboration individually, or you can generate a batch file that will run all collaborations for you. For more information, see Loading the Initial Data Set for a Master Index.