Analyzing and Cleansing Data for Sun Master Index

Extracting Legacy Data Using the Generate Schema Tool

When you use the Generate Schema tool to create the master index staging database, you need to manually create the collaboration and mapping that will extract the legacy data.

Perform the following tasks in the order given to use the Generate Schema tool to extract legacy data for a master index:

Creating the Staging Database

For this task, you create a new Data Integrator Module. Data Integrator uses the master index object definition file to define the structure of the staging database schema.

ProcedureTo Create the Staging Database

  1. Right-click in the NetBeans Projects window and select New Project.

  2. Under Categories, select SOA, and then select Data Integrator Module under Projects.

  3. Click Next.

  4. On the Name and Location page, enter a name for the project and the location of the project files.

  5. Click Finish.

  6. In the NetBeans window, right-click the project you just created and select Generate Schema.

  7. On the Choose DB Location dialog box, fill in the following fields. When you are finished, click OK.

    • Master Index Object Definition – The ojbect.xml file in the master index project for which you are analyzing data. This is located in NetBeans_Projects/Project_Name/src/Configuration.

    • Staging Database Location – The path to the location where you want to store the staging database.

    • Database Name – A unique name for the staging database.


      Note –

      Make a note of the name and location for the database. You will need to use these values to define the database URL in a later step.


  8. Verify that the database files were created in the path you specified.

Next Steps

Continue to Connecting to the Staging Database.

Connecting to the Staging Database

To connect to the staging database, you need to use the Virtual DB database driver. This requires entering information you entered for the database in Creating the Staging Database.

ProcedureTo Connect to the Staging Database

Before You Begin

Complete the steps under Creating the Staging Database.

  1. On the NetBeans Services window, expand Databases and then expand Drivers.

  2. Right-click Virtual DB, and then select Connect Using.

  3. On the New Database Connection dialog box, fill in the following fields:

    • Database URL – The URL to the staging database in this format: jdbc:axiondb:DatabaseName:DatabaseLocation.

      DatabaseName is the name you gave the database in Creating the Staging Database; DatabaseLocation is the path to the staging database.

    • User Name – The user login ID for the staging database.

    • Password – The password for the staging database.

      The default user name and password for the staging database is “sa” for both.

  4. Click OK.

  5. Under Databases in the Services window, expand the URL you entered above and view the tables in the database.

    If there are no database tables present, the connection failed. Recheck the values you entered for the database connection.

Next Steps

Continue to Connecting to the Source Database.

Connecting to the Source Database

For this task, you might need to install the driver for your source database type. Once a driver is available, you can create a connection to the database that contains the data to be analyzed and cleansed.

ProcedureTo Connect to the Source Database

Before You Begin

Complete the instructions under Connecting to the Staging Database.

  1. In the NetBeans Services window, expand Databases and then expand Drivers.

  2. If you do not see the driver for your source database, do the following:

    1. Right-click on Drivers and then select New Driver.

    2. On the Driver dialog box, click Add, and then browse to the location of the driver file.

      The driver you use depends on the database vendor for your source database. If you are unsure of how to find the driver, refer to your database documentation.

  3. Right-click the driver for your database, and then select Connect Using.

  4. On the New Database Connection dialog box, fill in the following fields:

    • Database URL - The URL to the source database. If you do not know the format for this, refer to your database documentation.

    • User Name – The user login ID for the source database.

    • Password – The password for the source database.

  5. Click OK.

  6. Under Databases in the Services window, expand the URL you entered above and view the tables in the database.

    If there are no database tables present, the connection failed. Recheck the values you entered for the database connection.

Next Steps

Continue to Creating the ETL Collaboration and Extracting the Data.

Creating the ETL Collaboration and Extracting the Data

In this task, you create the mapping from the source database to the staging database and then extract the data from source to staging. Before beginning this step, make sure your source database is running.


Note –

For more information about extracting and loading data using Data Integrator, see Designing Data Integrator Projects.


ProcedureTo Create the ETL Collaboration and Extract the Data

Before You Begin

Complete the instructions under Connecting to the Source Database.

  1. In the NetBeans Projects window, right-click the Data Integrator Module you created in Creating the Staging Database.

  2. Point to New, and then select Other.

    The New File Wizard appears.

  3. Select SOA under categories, and select ETL under File Types.

  4. Click Next.

  5. On the Name and Location window, enter a name for the collaboration and enter a name for the collaboration directory .

  6. Click Next.

  7. Select Basic Extract – Transform – Load (ETL), and then click Next.

  8. On the Select Source Tables window, do the following:

    1. Under Available Connections, select your source database name or URL.

    2. Under Schemas, select a table containing data to extract and then click Select.

    3. Repeat the above step for each table containing data to extract.

    4. Click Next.

      If there are multiple tables, the Select Source Tables to Create Join window appears.

  9. Do one of the following:

    • If the tables do not need to be joined, click Next.

    • To join tables, select the tables to be joined Under Available Tables, and then click the right arrow to move them under Selected Tables. Click Next.

  10. On the Select Target Tables window, do the following:

    1. Under Available Connections, select your target database name or URL.

    2. Under Schemas, select a table that will store extracted data and then click Select.

    3. Repeat the above step for each table that will store extracted data.

    4. Click Finish.

      The collaboration canvas appears with source tables on the left and target tables on the right.

  11. To map the data fields to be extracted from the source tables to the appropriate fields in the target tables, do one of the following:

    • To map fields individually, click the arrow next to the source field name and drag it to the target field name. Repeat for each field.

    • To map fields automatically, right-click a source table to map and then click Automap. If there is only one table available to map to, Data Integrator performs the mapping; otherwise, a dialog box appears so you can select the target table. Repeat for each table.


      Caution – Caution –

      You must map a unique value to the ObjectID column in each target table (where Object is the name of the object for which the table stores data). You can concatenate the source system and local ID of each incoming record to create a unique value, or you can define a custom function to populate the unique values.


  12. To configure the target tables, do the following for each target table:

    1. Select the target table and open its Properties panel (select Properties in the Window menu).

    2. Deselect Create Target Table.

    3. (Optional) Select Truncate Target Table.

  13. Save your work, and then click Run Collaboration on the collaboration toolbar.


    Tip –

    If any errors occur, the error message appears in the output panel at the bottom of the window. To fix the collaboration, select a target table, open its Properties panel, and delete the value of the Target Join Condition property. Repeat this step for each target table. Save and rerun the collaboration.


  14. After the extraction process is complete, verify the data in the target database table by right-clicking each table on the canvas and then selecting Show Data.

    The data in the tables appears in the output panel.

Next Steps

Continue to Determining the Fields to Analyze.