Designing Data Integrator Projects

Creating an Advanced ETL Collaboration


Note –

The Data Integrator Wizard was enhanced in Java CAPS 6 Update 1. The instructions in this topic might differ from what is available in Release 6.


An advanced collaboration allows you to transfer data from multiple types of data sources to a data target. This procedure describes how to create an advanced collaboration using the automated wizard. Depending on the type of data source and the options you use, the wizard skips certain unnecessary steps. This option might product multiple ETL collaborations depending on the number of target tables.

You can click Finish at any time during the wizard to generate a collaboration with the information you specified to that point. Then you can complete the configuration using the ETL Collaboration Editor.

ProcedureTo Create an Advanced ETL Collaboration

Before You Begin

Complete the following tasks:

  1. On the NetBeans Projects window, expand the new Data Integrator project and right-click Collaborations.

  2. Point to New, and then select ETL.

    The New File Wizard appears with the Name and Location window displayed.

  3. Enter name for the collaboration.

    Figure shows the Name and Location window of the Data
Integrator Wizard.
  4. Click Next.

  5. On the Select Type of ETL Loader window on the New File Wizard, select Advanced Extract – Transform – Load (ETL).

    Figure shows the Select Type of ETL Loader window of
the Data Integrator Wizard.
  6. Click Next.

    The Select or Create Database window appears.

  7. To specify a staging database to use for external data sources (for this project only), do one of the following:

    1. Select an existing database to use from the DB URL field.

    2. Select Create and Use New Database, enter a name for a new database in the DB Name field, and then click Create Database. Select the new database in the DB URL field.


      Note –

      This database is required and is used for internal processing only.


    Figure shows the Select or Create Database window of
the Data Integrator Wizard.
  8. Click Next.

    The Choose Data Source window appears.

  9. Do one of the following:

    • If you do not have any file data sources, click Next and skip to step 15 (choosing JDBC data sources).

    • To specify a file data source using a URL, enter the URL and click Add.

    • To specify a file data source that is stored on your network, browse for and select a file containing source data in the Choose a File box, and then click Add.

    • Repeat the above two steps until all file data sources are selected.

    Figure shows the Choose Data Source window of the Data
Integrator Wizard.
  10. Click Next.

    The Enter Table Details window appears, with the information for the first data file displayed.

  11. If necessary, modify the table name, the type of data encoding, and the type of document that contains the source data.

    Data Integrator automatically fills in these fields based on the information from the previous window, so the existing values should be correct.

    Figure shows the Enter Table Details window of the Data
Integrator Wizard.
  12. Click Next.

    If the data file is a spreadsheet, the Choose a Sheet window appears; otherwise, the Import Table MetaData window appears.

  13. If the Choose a Sheet window appears, select the name of the sheet in the spreadsheet that contains the source data, and then click Next.


    Tip –

    To view the contents of a sheet, click the Preview button.


    Figure shows the Choose a Sheet window of the Data Integrator
Wizard.
  14. When the Import Table Metadata window appears, modify the information about the data file as needed.

    Data Integrator automatically fills in this information, but you might need to customize it. For more information about the properties you can configure, see Virtual Database Table Metadata Options.

    Figure shows the Import Table Metadata window of the
Data Integrator Wizard.
  15. Preview the information in the bottom portion of the window, and then click Next.

    The Enter Column Properties window appears.

  16. In the upper portion of the window, customize any of the column properties.

    For more information about these properties, see Virtual Database Column Properties.

    Figure shows the Enter Column Properties window of the
Data Integrator Wizard.
  17. Preview the information in the lower portion of the window, and then click Next.

  18. Do one of the following:

    1. If you selected multiple file data sources, the wizard returns to the Enter Table Details window with the attributes for a different file displayed. Repeat the above steps beginning with step 7.

    2. If all the files you specified are configured, a dialog box appears confirming the database table creation. Click OK on the dialog box and continue to the next step.

      The Select JDBC Source Tables window appears.

  19. If you specified file data sources, they are already listed under Selected Tables here. Click Next if you have no JDBC data sources to specify, or do the following to specify a JDBC data source:

    Figure shows the Select Source Tables window of the Data
Integrator Wizard.
    1. Under Available Connections, select the database that contains the source data.

    2. If there are multiple schemas in the database, select the schema to use.

    3. Under Schemas, select the tables that contain the source data and then click Select.

    4. Click Next.

      If there are tables to join, the Select Source Tables for Join window appears; otherwise, the Generate Target Database window appears.

  20. To define join conditions, do the following. If there are no join conditions, click Next and skip to step 17.

    1. Under Available Tables, select the tables to join, and then click the right arrow to add them to the Selected Tables list.

    2. In the Preview panel, click the drop-down menu at the top of the join box and select the type of join to use from one of the following options:

      • Inner – Use this if all tables to be joined contain the same column.

      • Left Outer – Use this if the results should always include the records from the left table in the join clause.

      • Right Outer – Use this if the results should always include the records from the right table in the join clause.

      • Full Outer – Use this if the results should always include the records from both the right and left tables in the join clause.

      Figure shows the Select Source Tables for Join window
of the Data Integrator Wizard.
    3. To specify columns to exclude from each joined table, click the Select Column tab in the Preview pane and deselect any columns to exclude.

    4. Click Next.

      The Generate Target Database Master Index Model window appears. Using this page is described in a separate topic, Creating an ETL Collaboration for a Master Index Staging Database.

  21. Click Next.

    The Select JDBC Target Tables window appears.

  22. To choose the target tables to load the extracted data into, do the following:

    1. Under Available Connections, select the database that contains the schema to load the data into.

    2. Under Schemas, select the schema that contains the tables to load the data into.

    3. Under Schema, select the tables that will contain the target data and then click Select.


      Tip –

      You can use the Shift and Control keys to select multiple tables at once. If you add a table in error, select the table in the lower portion of the window and click Remove.


    Figure shows the Select Target Tables window of the Data
Integrator Wizard.
  23. Click Next.

    The Map Selected Collaboration Tables window appears.

  24. To map source and target data, do the following:

    1. To disable constraints on the target tables, select Disable Target Table Constraints.

    2. Select the SQL statement type to use for the transfer. You can select insert, update, or both.

    3. For each target table listed on the right, select one or more source tables from the list directly to the left of the target table. These are the source tables that will be mapped to the target in the collaboration.


      Note –

      If you do not specify a mapping here, the source tables do not appear in the ETL collaboration. You can add the source tables directly to the collaboration using the Select Source and Target Tables function. To select multiple source tables for one target, hold down the Control key while you select the required source tables. If you select multiple source tables for one target, the source tables are automatically joined.


    Figure shows the Map Selected Collaboration Tables window
of the Data Integrator Wizard.
  25. Click Finish.

    The new ETL collaboration appears in the Projects window. If multiple collaboration are created, they are given the name you specified for the collaboration with a target table name appended.

Next Steps

You can further configure the ETL collaboration using the ETL Collaboration Editor. For more information, see Configuring ETL Collaborations.