Designing Data Integrator Projects

Creating Sun Data Integrator Projects

The following tasks describe how to create and add components to a Sun Data Integrator project using the Data Integrator Wizard.

Connecting to Source and Target Databases

Before you can select databases and database tables to extract data from and load data to, you need to create and connect to the databases to use. Sun Data Integrator supports JDBC-compliant databases, flat files, and data mashup services. You only need to define the connections for relational databases. If you are using flat files as your source, you do not need to create or connect to a database. The wizard provides the ability to connect to multiple source files.

Connecting to a JDBC-Compliant Database

This step requires that the database drivers for the database platforms you are working with are installed. Some database drivers are already installed by default, but you might need to add the database driver depending on which database platform you are using. For example, if you are using Oracle or Microsoft SQL Server, you need to copy the driver to the application server and add it to the Services window.

To Connect to a JDBC-Compliant Database

Before You Begin

Make sure the database you are connecting to has already been created and is running. If the database drivers for the platforms you are using have not been installed to app_server/lib, copy the drivers to that location.

In the NetBeans Services window, expand Databases.

If you do not see the driver for the database you are using, copy the driver from your database installation to AppServer_Home/lib and then do the following:
1. Right-click Drivers, and select New Driver.
2. On the New JDBC Driver dialog box, click Add.
3. Browse to and open the JAR or ZIP file you copied to the application server libdirectory.
4. Accept the default driver class or type in a new one. If no driver is entered, click Find to have the wizard search for an appropriate class.
5. Enter a name for the driver.
6. Click OK.
The new database driver appears under Drivers in the Services window.

Right-click the new driver, and select Connect Using.

In the New Database Connection dialog box, do the following:
1. Enter the database connection URL.
  
  Note –
  Different database platforms use different connection URLs. Refer to your database documentation for the format to use.
2. Enter the user name and password to use to log on to the database.
3. Select Remember Password.
4. To select a specific schema in the database, click the Advanced tab and then click Get Schemas.
  
  A list of available schemas appears for you to choose from.
5. Click OK.
A new database connection appears under Databases.

Right-click the new database, and select Connect.

Creating and Connecting to Data Mashup Services

If you want to use more than one data source for the data integration process, you can create a Data Mashup Service. Data mashup allows you to select multiple data sources of varying types and combine them into one target database. Source data can reside in files on your network or on the web in HTML, RSS, or Web Row Set format.

Note –

New in Java CAPS Release 6 Update 1, you can specify multiple data sources using the Data Integrator Wizard. If you are using multiple source files, you can either create the mashup here or you can use the Data Integrator Wizard to specify the sources directly.

To Create a Mashup Database for Source Data

In the NetBeans main menu, select Tools, point to Virtual Database, and then select Create Virtual Database.

On the New Virtual Database wizard, enter a name for the database and then click Finish.

Click OK on the confirmation dialog box that appears.

In the NetBeans main menu, select Tools, point to Virtual Database, and then select Add External Tables.

The Add External Tables Wizard appears.

Select the database you just created and then click Next.

The Choose Data Source window appears.

To add data sources, do any of the following:
1. If the data source is on the web (such as HTML or Web Row Set), enter a URL for the data source and click Add.
2. If the data source is a file on your network, brows to and select the input file. Click Add.
3. Repeat the above steps for each data source.
  
  Tip –
  If you add a data source in error, highlight it in the table and then click Remove.
4. Click Next.
  
  The Enter Tables Details window appears.

Enter table information for the table specified in the Table Name field, and then click Next.

Depending on the type of file you selected, the Choose a Sheet, Choose a (HTML) Table, or Import Table MetaData window appears.

If the Choose a Sheet or Choose a (HTML) Table window appears, do the following:
- Select the name of the sheet that contains the data to use.
- To view the data, click Preview.
- Click Next.
  
  The Import Table MetaData window appears.

If necessary, modify the information required to parse the data source.

The available options on this window vary depending on the type of data source. For more information about the properties you can modify, see Virtual Database Table Metadata Options.

Click Next.

The Enter Column Properties window appears.

Modify the properties for the database columns in the upper portion of the window.

Tip –
For more information about column properties, see Virtual Database Column Properties. If your data source does not contain field names, you should customize the column names for clarity.

Preview the source data in the lower portion of the window.

Do one of the following:
1. If there are additional data sources to configure, click Next. The wizard automatically returns to the Enter Table Details window so you can repeat the above steps for each data source to add.
2. If there are no more data sources to configure, click Finish.

Right–click the new database and select Connect.

Virtual Database Table Metadata Options

When you add external tables to a virtual database, you can configure the metadata for each data source. All metadata properties are listed below, but some might not be available depending on the type of data source you are adding.

Property	Description	Values
Default Data Type (or WIZARDDEFAULT SQLTYPE)	The default data type used for all fields in the data source (you can change the default in subsequent steps).	varchar numeric time timestamp
Record Length (or WIZARDDEFAULT PRECISION)	The maximum length of a record in number of characters. This option must be appropriate for the selected data type and must be the same for all fields.	Any integer greater than or equal to 0.
Field Count	The number of fields per record.	Any integer greater than 1.
Default Precision	The length of the database columns for each field. You can modify this value for each field at a later time.	For numeric data types, enter <= 38. For time/timestamp data types, enter the length of the format.
Type of XML File	An indicator of whether the XML file is read/write or read only.	READWRITE READONLY
ROWNAME
Record Delimiter	The character that separates each record.	newline (LF) carriage return (CR) CR LF CR LF or LF semicolon (;) comma (,) tab pipe (\|)
Field Delimiter	The character that separates each field in a record. Select User Defined if the character does not match any of the other options in the menu.	comma tab semicolon pipe User Defined
User-defined Field Delimiter	The custom character that separates each field in a record. Use this field to specify a delimiter that is not a comma, tab, semicolon, or pipe. Unless you select User Defined for the Field Delimiter, this field is ignored.
Text Qualifier	A qualifier used to indicate text.	none double quote: “ single quote: ”
First line contains field names?	An indicator of whether the names specified in the header row are used as field names or whether Data Integrator should assign default field names.	Select the check box to use column header names from the file. Deselect the check box if the file does not contain a header row.
Create data file if not exist?
Header Offset	The number of bytes to skip before reaching the start of the first record. This value is ignored if the First line contains field names? check box is deselected.	Any integer greater than or equal to 0.
Rows to Skip (or Records to Skip)	The number of rows or records to skip before the starting row or record for the data set. Specify 0 (zero) to include all rows or records from the source.	Any integer greater than or equal to 0.
Maximum # of Faults to Tolerate	The number of faults that can occur before Data Integrator generates an error message.	Any integer greater than or equal to 0.
Trim Whitespace	An indicator of whether to strip white space and tabs from the beginning and end of a string.	Select the check box to trim white space. Deselect the check box to leave white space in the string.

Virtual Database Column Properties

When you add external tables to a virtual database, you can configure the column attributes for each data source. All column properties are listed below.

Property	Description	Values
#	The number of each column (this value cannot be modified).
Length	The length for each column in the virtual database.	Any integer greater than 0. This value must be appropriate for the data type.
Name	The name of each column.	An unlimited number of characters.
Datatype	The type of data stored in each field.	varchar time numeric timestamp
Scale	The number of digits to the right of the decimal point in a number field; for example, 9876.543 has a scale of 3.	An integer greater than 0.
Null	An indicator of whether the field can be null.	Select the check box if the field can be null or deselect it if the field cannot be null.
PK	An indicator of whether the column is a primary key.	Select the check box if the column is a primary key or deselect it if the column is not a primary key.
Default	Any default data to add to a column.

Creating a New Data Integrator Project

Before you can begin to define and configure the components and ETL processes to use for your data integration you need to create a new project of the type Data Integrator Module.

To Create a New Project

Right-click in the NetBeans Projects window, and select New Project.

The New Project Wizard appears.

Under Categories, select SOA.

Under Projects, select Data Integrator Module.

Click Next.

Enter a unique name and a location for the project.

If this is not a main project, deselect Set as Main Project.

Click Finish.

The new project appears in the Projects window.

Next Steps

Create an ETL Collaboration following the instructions provided under Creating an ETL Collaboration Using the Wizard. You can also create an ETL Collaboration from scratch, but the wizard provides a quick and easy way to generate most of the collaboration code.

Creating an ETL Collaboration Using the Wizard

Note –

The Data Integrator Wizard was enhanced in Java CAPS 6 Update 1. The instructions in this topic might differ from what is available in Release 6.

You can use the Data Integrator Wizard to create as much or as little of the ETL collaboration as you want. You can exit the wizard at any time once the basic framework is defined. After you complete the wizard, you can open the collaboration for further configuration.

The wizard provides three options for the collaboration:

Basic Extract – Allows you to generate an ETL Collaboration that extracts, transforms, and loads data between JDBC, virtual (mashup), and flat-file databases. To create a basic collaboration, follow the instructions under Creating a Basic ETL Collaboration.
Advanced Extract – Allows you to generate an ETL Collaboration that extracts, transforms, and loads data between data sources and targets when there are multiple sources of different types. This option can also be used for creating an MDM staging database using a Sun Master Index schema to generate the database tables. The staging database can then be used by the Data Cleanser, Data Profiler, and Initial Bulk Match and Load tool for a Sun Master Index. To create an advanced collaboration or a staging database for a master index application, follow the instructions under Creating an Advanced ETL Collaboration. To create a staging database for a master index application, follow the instructions under Creating an ETL Collaboration for a Master Index Staging Database.
Bulk Loader – Allows you to generate an ETL Collaboration that loads delimited data in a flat file that is structurally identical to a JDBC-compliant target database. This is specifically designed to load the data images produced by the Bulk Matcher into a master index database. To create a bulk loader collaboration for a master index application, follow the instructions under Creating a Bulk Loader ETL Collaboration.

Creating a Basic ETL Collaboration

Note –

The Data Integrator Wizard was enhanced in Java CAPS 6 Update 1. The instructions in this topic might differ from what is available in Release 6.

A basic collaboration allows you to transfer data from a single data source to a data target. If you have multiple sources, you can create a virtual database before creating the basic collaboration. See Creating and Connecting to Data Mashup Services for more information. You could also use the Advanced option of the wizard instead.

You can click Finish at any time during the wizard to generate a collaboration with the information you specified to that point. Then you can complete the configuration using the ETL Collaboration Editor.

To Create a Basic ETL Collaboration

Before You Begin

Complete the following tasks:

Connecting to Source and Target Databases (if your source or target data is stored in a relational or virtual database)
Creating a New Data Integrator Project

On the NetBeans Projects window, expand the new Data Integrator project and right-click Collaborations.

Point to New, and then select ETL.

The New File Wizard appears with the Name and Location window displayed.

Enter name for the collaboration.

Click Next.

On the Select Type of ETL Loader window on the New File Wizard, select Basic Extract – Transform – Load (ETL).

Click Next.

The Select Source Tables window appears.

To select the source data, do the following:
1. Under Available Connections, select the database that contains the data to be extracted.
2. Under Schemas, select the name of the database schema that contains the data to be extracted.
  
  Data Integrator automatically selects a schema based on the login information. You only need to change this field if you are using a different schema.
3. Under Schemas, select the tables containing the source data and then click Select.
  
  Tip –
  You can use the Shift and Control keys to select multiple tables at once. If you add a table in error, select the table in the lower portion of the window and click Remove.
4. Click Next.
  
  The Select Source Tables for Join window appears.

To define join conditions, do the following. If there are no join conditions, click Next.
1. Under Available Tables, select the tables to join, and then click the right arrow to add them to the Selected Tables list.
2. In the Preview panel, click the drop-down menu at the top of the join box and select the type of join to use from one of the following options:
  - Inner – Use this if all tables to be joined contain the same column.
  - Left Outer – Use this if the results should always include the records from the left table in the join clause.
  - Right Outer – Use this if the results should always include the records from the right table in the join clause.
  - Full Outer – Use this if the results should always include the records from both the right and left tables in the join clause.
3. To specify columns to exclude from each joined table, click the Select Column tab in the Preview panel, expand the table list, and deselect any columns to exclude.
4. Click Next.
  
  The Select Target Tables window appears.

To choose the target tables to load the extracted data into, do the following:
1. Under Available Connections, select the database that contains the schema to load the data into.
2. Under Schemas, select the schema that contains the tables to load the data into.
3. Under Schemas, select the tables that will contain the target data and then click Select.
  
  Tip –
  You can use the Shift and Control keys to select multiple tables at once. If you add a table in error, select the table in the lower portion of the window and click Remove.
4. Click Finish.
The new ETL collaboration appears in the Projects window, and the Collaboration Editor opens with the source tables displayed on the left and target tables displayed on the right.

Next Steps

You can further configure the ETL collaboration using the ETL Collaboration Editor. For more information, see Configuring ETL Collaborations.

Creating an Advanced ETL Collaboration

Note –

The Data Integrator Wizard was enhanced in Java CAPS 6 Update 1. The instructions in this topic might differ from what is available in Release 6.

An advanced collaboration allows you to transfer data from multiple types of data sources to a data target. This procedure describes how to create an advanced collaboration using the automated wizard. Depending on the type of data source and the options you use, the wizard skips certain unnecessary steps. This option might product multiple ETL collaborations depending on the number of target tables.

To Create an Advanced ETL Collaboration

Before You Begin

Complete the following tasks:

Connecting to Source and Target Databases (if your source or target data is stored in a relational or virtual database)
Creating a New Data Integrator Project

On the NetBeans Projects window, expand the new Data Integrator project and right-click Collaborations.

Point to New, and then select ETL.

The New File Wizard appears with the Name and Location window displayed.

Enter name for the collaboration.

Click Next.

On the Select Type of ETL Loader window on the New File Wizard, select Advanced Extract – Transform – Load (ETL).

Click Next.

The Select or Create Database window appears.

To specify a staging database to use for external data sources (for this project only), do one of the following:
1. Select an existing database to use from the DB URL field.
2. Select Create and Use New Database, enter a name for a new database in the DB Name field, and then click Create Database. Select the new database in the DB URL field.
  
  Note –
  This database is required and is used for internal processing only.

Click Next.

The Choose Data Source window appears.

Do one of the following:
- If you do not have any file data sources, click Next and skip to step 15 (choosing JDBC data sources).
- To specify a file data source using a URL, enter the URL and click Add.
- To specify a file data source that is stored on your network, browse for and select a file containing source data in the Choose a File box, and then click Add.
- Repeat the above two steps until all file data sources are selected.

Click Next.

The Enter Table Details window appears, with the information for the first data file displayed.

If necessary, modify the table name, the type of data encoding, and the type of document that contains the source data.

Data Integrator automatically fills in these fields based on the information from the previous window, so the existing values should be correct.

Click Next.

If the data file is a spreadsheet, the Choose a Sheet window appears; otherwise, the Import Table MetaData window appears.

If the Choose a Sheet window appears, select the name of the sheet in the spreadsheet that contains the source data, and then click Next.

Tip –
To view the contents of a sheet, click the Preview button.

When the Import Table Metadata window appears, modify the information about the data file as needed.

Data Integrator automatically fills in this information, but you might need to customize it. For more information about the properties you can configure, see Virtual Database Table Metadata Options.

Preview the information in the bottom portion of the window, and then click Next.

The Enter Column Properties window appears.

In the upper portion of the window, customize any of the column properties.

For more information about these properties, see Virtual Database Column Properties.

Preview the information in the lower portion of the window, and then click Next.

Do one of the following:
1. If you selected multiple file data sources, the wizard returns to the Enter Table Details window with the attributes for a different file displayed. Repeat the above steps beginning with step 7.
2. If all the files you specified are configured, a dialog box appears confirming the database table creation. Click OK on the dialog box and continue to the next step.
  
  The Select JDBC Source Tables window appears.

If you specified file data sources, they are already listed under Selected Tables here. Click Next if you have no JDBC data sources to specify, or do the following to specify a JDBC data source:
1. Under Available Connections, select the database that contains the source data.
2. If there are multiple schemas in the database, select the schema to use.
3. Under Schemas, select the tables that contain the source data and then click Select.
4. Click Next.
  
  If there are tables to join, the Select Source Tables for Join window appears; otherwise, the Generate Target Database window appears.

To define join conditions, do the following. If there are no join conditions, click Next and skip to step 17.
1. Under Available Tables, select the tables to join, and then click the right arrow to add them to the Selected Tables list.
2. In the Preview panel, click the drop-down menu at the top of the join box and select the type of join to use from one of the following options:
  - Inner – Use this if all tables to be joined contain the same column.
  - Left Outer – Use this if the results should always include the records from the left table in the join clause.
  - Right Outer – Use this if the results should always include the records from the right table in the join clause.
  - Full Outer – Use this if the results should always include the records from both the right and left tables in the join clause.
3. To specify columns to exclude from each joined table, click the Select Column tab in the Preview pane and deselect any columns to exclude.
4. Click Next.
  
  The Generate Target Database Master Index Model window appears. Using this page is described in a separate topic, Creating an ETL Collaboration for a Master Index Staging Database.

Click Next.

The Select JDBC Target Tables window appears.

To choose the target tables to load the extracted data into, do the following:
1. Under Available Connections, select the database that contains the schema to load the data into.
2. Under Schemas, select the schema that contains the tables to load the data into.
3. Under Schema, select the tables that will contain the target data and then click Select.
  
  Tip –
  You can use the Shift and Control keys to select multiple tables at once. If you add a table in error, select the table in the lower portion of the window and click Remove.

Click Next.

The Map Selected Collaboration Tables window appears.

To map source and target data, do the following:
1. To disable constraints on the target tables, select Disable Target Table Constraints.
2. Select the SQL statement type to use for the transfer. You can select insert, update, or both.
3. For each target table listed on the right, select one or more source tables from the list directly to the left of the target table. These are the source tables that will be mapped to the target in the collaboration.
  
  Note –
  If you do not specify a mapping here, the source tables do not appear in the ETL collaboration. You can add the source tables directly to the collaboration using the Select Source and Target Tables function. To select multiple source tables for one target, hold down the Control key while you select the required source tables. If you select multiple source tables for one target, the source tables are automatically joined.

Click Finish.

The new ETL collaboration appears in the Projects window. If multiple collaboration are created, they are given the name you specified for the collaboration with a target table name appended.

Next Steps

You can further configure the ETL collaboration using the ETL Collaboration Editor. For more information, see Configuring ETL Collaborations.

Creating an ETL Collaboration for a Master Index Staging Database

Note –

The Data Integrator Wizard was enhanced in Java CAPS 6 Update 1. The instructions in this topic might differ from what is available in Release 6.

The Data Integrator Wizard helps you create and populate a staging database that stores the legacy data to be loaded into a master index database so you can cleanse and load the data in bulk. Data Integrator generates the staging database based on the object structure defined for the master index, so the data is automatically presented in a format that the Data Cleanser, Data Profiler, and Bulk Matcher can read. This procedure describes how to create the staging database using the automated wizard. Depending on the type of data source and the options you use, the wizard skips certain unnecessary steps.

To Create an ETL Collaboration for a Master Index Staging Database

Before You Begin

Complete the following tasks:

Connecting to Source and Target Databases (if your source or target data is stored in a relational or virtual database)
Creating a New Data Integrator Project

On the NetBeans Projects window, expand the new Data Integrator project and right-click Collaborations.

Point to New, and then select ETL.

The New File Wizard appears with the Name and Location window displayed.

Enter name for the collaboration.

Click Next.

On the Select Type of ETL Loader window on the New File Wizard, select Advanced Extract – Transform – Load (ETL).

Click Next.

The Select or Create Database window appears.

To specify a staging database to use for external data sources (for this project only), do one of the following:
1. Select an existing database to use from the DB URL field.
2. Select Create and Use New Database, enter a name for a new database in the DB Name field, and then click Create Database. Select the new database in the DB URL field.
  
  Note –
  This database is required and is used for internal processing only.

Click Next.

The Choose Data Source window appears.

Do one of the following:
- If you do not have any file data sources, click Next and skip to step 15 (choosing JDBC data sources).
- To specify a file data source using a URL, enter the URL and click Add.
- To specify a file data source that is stored on your network, browse for and select a file containing source data in the Choose a File box, and then click Add.
- Repeat the above two steps until all file data sources are selected.

Click Next.

The Enter Table Details window appears, with the information for the first data file displayed.

If necessary, modify the table name, the type of data encoding, and the type of document that contains the source data.

Data Integrator automatically fills in these fields based on the information from the previous window, so the existing values should be correct.

Click Next.

If the data file is a spreadsheet, the Choose a Sheet window appears; otherwise, the Import Table MetaData window appears.

If the Choose a Sheet window appears, select the name of the sheet in the spreadsheet that contains the source data, and then click Next.

Tip –
To view the data in a sheet, click the Preview button.

When the Import Table Metadata window appears, modify the information about the data file as needed.

Data Integrator automatically fills in this information, but you might need to customize it. For more information about the properties you can configure, see Virtual Database Table Metadata Options.

Preview the information in the bottom portion of the window, and then click Next.

The Enter Column Properties window appears.

In the upper portion of the window, customize any of the column properties.

For more information about these properties, see Virtual Database Column Properties.

Preview the information in the lower portion of the window, and then click Next.

Do one of the following:
1. If you selected multiple file data sources, the wizard returns to the Enter Table Details window with the attributes for a different file displayed. Repeat the above steps beginning with step 7.
2. If all the files you specified are configured, a dialog box appears confirming the database table creation. Click OK on the dialog box and continue to the next step.
  
  The Select Source Tables window appears.

If you specified file data sources, they are already listed under Selected Tables. Click Next if you have no JDBC data sources to specify, or do the following to specify a JDBC data source:
1. Under Available Connections, select the database that contains the source data.
2. If there are multiple schemas in the database, select the schema to use.
3. Under Schemas, select the tables that contain the source data and then click Select.
4. Click Next.
  
  If there are tables to join, the Select Source Tables for Join window appears; otherwise, the Generate Target Database window appears.

To define join conditions, do the following. If there are no join conditions, click Next and skip to step 17.
1. Under Available Tables, select the tables to join, and then click the right arrow to add them to the Selected Tables list.
2. In the Preview panel, click the drop-down menu at the top of the join box and select the type of join to use from one of the following options:
  - Inner – Use this if all tables to be joined contain the same column.
  - Left Outer – Use this if the results should always include the records from the left table in the join clause.
  - Right Outer – Use this if the results should always include the records from the right table in the join clause.
  - Full Outer – Use this if the results should always include the records from both the right and left tables in the join clause.
3. To specify columns to exclude from each joined table, click the Select Column tab in the Preview pane and deselect any columns to exclude.
4. Click Next.
  
  The Generate Target Database Master Index Model window appears.

To create the staging database, do the following:
1. Deselect the check box for Use Existing Database Target Tables.
2. In the Object Definition File field, browse to and select the object.xml file generated for the Master Index project.
  
  Note –
  This file is located in NetBeansProjects_Home/Project_Name/src/Configuration.
3. In the Target Database Folder field, select or enter the path where you want to store the database.
4. In the Target Database Name field, enter a name for the database.
5. Click Generate Database.

Click Next.

The Select JDBC Target Tables window appears. The target tables to load the extracted data into are already listed under Available Connections. It is not recommended you change these.

Click Next.

The Map Selected Collaboration Tables window appears.

To map source and target data, do the following:
1. To disable constraints on the target tables, select Disable Target Table Constraints.
2. Select the SQL statement type to use for the transfer. You can select insert, update, or both.
3. For each target table listed on the right, select one or more source tables from the list directly to the left of the target table. These are the source tables that will be mapped to the target in the collaboration.
  
  Note –
  If you do not specify a mapping here, the source tables do not appear in the ETL collaboration. You can add the source tables directly to the collaboration using the Select Source and Target Tables function. To select multiple source tables for one target, hold down the Control key while you select the required source tables. If you select multiple source tables for one target, the source tables are automatically joined.

Click Finish.

The new ETL collaboration appears in the Projects window. If multiple collaboration are created, they are given the name you specified for the collaboration with a target table name appended. To load the data into the staging database, run each of the collaborations. Make sure you are connected to both databases first.

Next Steps

You can further configure the ETL collaboration using the ETL Collaboration Editor. For more information, see Configuring ETL Collaborations.

Creating a Bulk Loader ETL Collaboration

Note –

The Data Integrator Wizard was enhanced in Java CAPS 6 Update 1. The instructions in this topic might differ from what is available in Release 6.

You can use the Data Integrator Wizard to generate the Bulk Loader for a master index application. The Bulk Loader loads data that has already been cleansed, standardized, and matched into a master index database. The source files for the Bulk Loader are those generated by the Bulk Matcher.

To Create a Bulk Loader ETL Collaboration

Before You Begin

Complete the steps under Creating a New Data Integrator Project.
Make sure the master index database is running, and that your NetBeans IDE is connected to the master index database.
In order to specify the source files for the Bulk Loader, you need to run the Bulk Matcher first. For more information see, Loading the Initial Data Set for a Sun Master Index.

On the NetBeans Projects window, expand the new Data Integrator project and right-click Collaborations.

Point to New, and then select ETL.

The New File Wizard appears with the Name and Location window displayed.

Enter name for the collaboration.

Click Next.

On the Select Type of ETL Loader window on the New File Wizard, select Bulk Loader.

Click Next.

The Select or Create Database window appears.

To specify a staging database to use for external data sources (for this project only), do one of the following:
1. Select an existing database to use from the DB URL field.
2. Select Create and Use New Database, enter a name for a new database in the DB Name field, and then click Create Database. Select the new database in the DB URL field.
  
  Note –
  This database is required and is used for internal processing only.

Click Next.

The Select JDBC Target Tables window appears.

To choose the target tables to load the extracted data into, do the following:
1. Under Available Connections, select the master index database.
2. Under Schemas, select the schema that contains the tables to load the data into.
3. Under Schemas, select only the tables that correspond to the data files produced by the Bulk Matcher, and then click Select.
  
  Tip –
  You can use the Shift and Control keys to select multiple tables at once. If you select target tables that do not correspond to the Bulk Matcher files, collaborations without source table are generated and the project fails to build.

Click Next.

The Choose Bulk Loader Data Source window appears.

To specify the source data for the Bulk Loader, do the following:
1. In the upper portion of the window, browse to the location of the of the output files from the Bulk Matcher.
  
  Note –
  These files are located in NetBeansProjects_Home/Project_Name/loader-generated/loader/work/masterindex, where work is the location you specified for the working directory in loader-config.xml.
2. Select all of the data files in the masterindex directory, and then click Add.

Click Next.

The Map Selected Collaboration Tables window appears.

To map source and target data, do the following:
1. To disable constraints on the target tables, select Disable Target Table Constraints.
2. Select the SQL statement type to use for the transfer. You can select insert, update, or both.
3. The wizard automatically maps the source and target tables for you. Review the mapping to verify its accuracy.
  
  Note –
  Not every table on the left will be mapped. For example, system tables such as SBYN_COMMON_HEADER, SBYN_COMMON_DETAIL, SBYN_APPL, and SBYN_SYSTEMS do not need to be mapped.

Click Finish.

An ETL collaboration is created for each target table. This might take a few minutes to generate.

Next Steps

You can further configure the ETL collaboration in the ETL Collaboration Editor. For more information, see Configuring ETL Collaborations.

To load the data into the master index database, you can either run each collaboration individually, or you can generate a batch file that will run all collaborations for you. For more information, see Loading Matched Data Using the Data Integrator Wizard Bulk Loader in Loading the Initial Data Set for a Sun Master Index.