Analyzing and Cleansing Data for Sun Master Index

Creating the Staging Database and ETL Collaboration

This procedure describes how to create the staging database using the automated wizard. Depending on the type of data source and the options you use, the wizard skips certain unnecessary steps.

ProcedureTo Create the Staging Database and ETL Collaboration

Before You Begin

Complete the steps under Connecting to the Source Database.

  1. On the Select Type of ETL Loader on the New File Wizard, select Advanced Extract – Transform – Load (ETL).

  2. Click Next.

    The Select or Create Database window appears.

  3. To select a staging database to use for external data sources (for this project only), do one of the following:

    1. Select an existing database to use from the DB URL field.

    2. Select Create and Use New Database, enter a name for a new database in the DB Name field, and then click Create Database. Select the new database in the DB URL field.


      Note –

      This database is required and is used for internal processing only.


  4. Click Next.

    The Choose Data Source window appears.

  5. Do any of the following:

    • If you do not have any file data sources, click Next and skip to step 15 (choosing JDBC data sources).

    • To specify a file data source using a URL, enter the URL and click Add.

    • To specify a file data source that is stored on your network, browse for and select a file containing source data in the Choose a File box, and then click Add.

    • Repeat the above two steps until all file data sources are selected.

  6. Click Next.

    The Enter Table Details window appears, with the information for the first data file displayed.

  7. If necessary, modify the table name, the type of data encoding, and the type of document that contains the source data.

    Data Integrator automatically fills in these fields based on the information from the previous window, so the existing values should be correct.

  8. Click Next.

    If the data file is a spreadsheet, the Choose a Sheet window appears; otherwise, the Import Table MetaData window appears.

  9. If the Choose a Sheet window appears, select the name of the sheet in the spreadsheet that contains the source data, and then click Next.

  10. When the Import Table Metadata window appears, modify the information about the data file as needed.

    Data Integrator automatically fills in this information, but you might need to customize it. For more information about the properties you can configure, see Virtual Database Table Metadata Options in Designing Data Integrator Projects.

  11. Preview the information in the bottom portion of the window, and then click Next.

    The Enter Column Properties window appears.

  12. In the upper portion of the window, customize any of the column properties.

    For more information about these properties, see Virtual Database Column Properties in Designing Data Integrator Projects.

  13. Preview the information in the lower portion of the window, and then click Next.

  14. Do one of the following:

    1. If you selected multiple file data sources, the wizard returns to the Enter Table Details window with the attributes for a different file displayed. Repeat the above steps beginning with step 7.

    2. If all the files you specified are configured, a dialog box appears confirming the database table creation. Click OK on the dialog box and continue to the next step.

      The Select JDBC Source Tables window appears.

  15. If you specified file data sources, they are already listed under Selected Tables. Click Next if you have no JDBC data sources to specify, or do the following to specify a JDBC data source:

    1. Under Available Connections, select the database that contains the source data.

    2. If there are multiple schemas in the database, select the schema to use.

    3. Under Schemas, select the tables that contain the source data and then click Select.

    4. Click Next.

      If there are tables to join, the Select Source Tables for Join window appears; otherwise, the Generate Target Database window appears.

  16. To define join conditions, do the following. If there are no join conditions, click Next and skip to step 17.

    1. Under Available Tables, select the tables to join, and then click the right arrow to add them to the Selected Tables list.

    2. In the Preview panel, click the drop-down menu at the top of the join box and select the type of join to use from one of the following options:

      • Inner – Use this if all tables to be joined contain the same column.

      • Left Outer – Use this if the results should always include the records from the left table in the join clause.

      • Right Outer – Use this if the results should always include the records from the right table in the join clause.

      • Full Outer – Use this if the results should always include the records from both the right and left tables in the join clause.

    3. To specify columns to exclude from each joined table, click the Select Column tab in the Preview pane and deselect any columns to exclude.

    4. Click Next.

      The Generate Target Database Master Index Model window appears.

  17. To create the staging database, do the following:

    1. Deselect the check box for Use Existing Database Target Tables.

    2. In the Object Definition File field, browse to and select the object.xml file generated for the Master Index project.


      Note –

      This file is located in NetBeansProjects_Home/Project_Name/src/Configuration.


    3. In the Target Database Folder field, select or enter the path where you want to store the database.

    4. In the Target Database Name field, enter a name for the database.

    5. Click Generate Database.

  18. Click Next.

    The Select JDBC Target Tables window appears. The target tables to load the extracted data into are already listed under Available Connections. It is not recommended you change these.

  19. Click Next.

    The Map Selected Collaboration Tables window appears.

  20. To map source and target data, do the following:

    1. To disable constraints on the target tables, select Disable Target Table Constraints.

    2. Select the SQL statement type to use for the transfer. You can select insert, update, or both.

    3. For each target table listed on the right, select one or more source tables from the list directly to the left of the target table. These are the source tables that will be mapped to the target in the collaboration.


      Note –

      If you do not specify a mapping here, the source tables do not appear in the ETL collaboration. You can add the source tables directly to the collaboration using the Select Source and Target Tables function. To select multiple source tables for one target, hold down the Control key while you select the required source tables. If you select multiple source tables for one target, the source tables are automatically joined.


    4. Click Finish.

      The new ETL collaboration appears in the Projects window. If multiple collaboration are created, they are given the name you specified for the collaboration with a target table name appended.

Next Steps

Continue to Connecting to the Staging Database.