Using the Sun Data Mashup Engine

Designing Data Mashup Projects

The following exercise leads you through the steps of designing a Data Mashup project, and includes the following steps:

For this exercise you will create a federated join using two data files, a Microsoft Excel spreadsheet (company-data.xls) and a delimited text file (supplier-address.txt). Ensure that you have the data files accurately set up before beginning the procedure.

Perform the following steps in the order given to develop the Data Mashup project:

ProcedureTo Create the External Data Files

You can either create the spreadsheet and delimited text file or you can download the files from http://www.esnips.com/web/DataMashupSandraUseCaseFiles. Note that the name of the XLS file that is available to download has been changed from sandra-data.xls to company-data.xls for this exercise.

  1. Create an Excel spreadsheet named company-data.xls that contains the following information:

    EQ162 

    Funnel 

    Shelly glass and labs 

    LC05647 

    EQ763 

    Spirit 

    Torry Harris Equipment 

    LC23456 

    EQ785 

    Beaker 

    Torry Harris Equipment 

    LC23456 

    EQ232 

    Wire mesh 

    Angels Enterprises 

    LC34267 

    EQ892 

    Stirrer 

    Angels Enterprises 

    LC34267 

    EQ232 

    Wire mesh 

    Kittens Inc. 

    LC45634 

    EQ763 

    Asbestos sheets 

    Labkron Instruments 

    LC56473 

    EQ785 

    Beaker 

    Labkron Instruments 

    LC56473 

    EQ162 

    Funnel 

    Labtech Inc. 

    LC56743 

    EQ763 

    Asbestos sheets 

    Labtech Inc. 

    LC56743 

    EQ763 

    Spirit 

    Peter Labs and Instruments 

    LC63234 

    EQ563 

    Test tube 

    Peter Labs and Instruments 

    LC63234 

    EQ092 

    Pipette 

    Patel Lab Exports 

    LC67384 

    EQ243 

    Bunsen burner 

    Alaska Lab Equipment 

    LC76483 

    EQ892 

    Stirrer 

    Alaska Lab Equipment 

    LC76483 

    EQ162 

    Funnel  

    Safe Labs Inc. 

    LC87635 

    EQ563 

    Test tube 

    Davidson Equipment Exports 

    LC87645 

    EQ092 

    Pipette 

    Andrew Instruments 

    LC97627 

    EQ243 

    Bunsen burner 

    Andrew Instruments 

    LC97627 

    EQ162 

    Funnel 

    Andrew Instruments 

    LC97627 

  2. Create a delimited text file named supplier-address.txt that contains the following information:


    LC76434|Yahoo Labs and Instruments|1285, 1st Avenue 69th and 70th, New York, NY 10021
    LC76323|Isabel Instruments|330 SUNRISE HIGHWAY, ROCKVILLE CTR, NY 11570
    LC76483|Alaska Lab Equipment|275 SOUTH STREET, OYSTER BAY, NY 11771
    LC63234|Peter Labs and Instruments|430 E MAIN ST, BAY SHORE, NY 11706
    LC34267|Angels Enterprises|2950 AVENTURA BLVD, AVENTURA, FL 33180
    LC45634|Kittens Inc.|South Town Center, 10450 S State St, Sandy, UT 84070
    LC87635|Safe Labs Inc.|Guadalupe Center, 333 Montezuma Ave, Santa Fe, NM 87501
    LC56473|Labkron Instruments|Casis Village, 2700 Exposition, Austin, TX 78703
    LC56743|Labtech Inc.|Midtown, 911 W 38th St, Austin, TX 78705
    LC67384|Patel Lab Exports|3 University Dr, Augusta, ME 04330
    LC05647|Shelly glass and labs|633 W 5th St Los Angeles, CA 90071
    LC87645|Davidson Equipment Exports|South Town Center, 10450 S State St, Sandy, UT 84070
    LC23456|Torry Harris Equipment|Fairlawn Drive Up, 5325 Sw 21st St, Topeka, KS 66604
    LC97627|Andrew Instruments|Valero 4550, 115 N Greeley Hwy, Cheyenne, WY 82001

ProcedureTo Create the Data Mashup Database

Before You Begin

Before beginning this procedure, you must have the above-mentioned files ready and stored on your computer, and have started the NetBeans IDE.

  1. From the NetBeans IDE tool bar select Tools->Virtual Database->Create Virtual Database.

  2. In the Create Virtual Database window, type a Database Name that represents the project you are creating.

    For this exercise, name the database VirtualMashupDB.

  3. After naming the database, click Finish.

  4. Click OK on the configuration message that appears.

    You are now ready to start adding external data to the virtual database you just created.

ProcedureTo Add External Data to the Virtual Database (Excel Spreadsheet)

Before You Begin

Make sure you have the two data sources, a spreadsheet and a delimited text file. These will be merged into one table. For this step add the spreadsheet to the table, and then fine-tune some of its fields.

  1. From the NetBeans IDE tool bar, select Tools->Virtual Database->Add External Tables.

  2. In the Choose Virtual Database window, select the appropriate database from the list of available databases.

    For this exercise, select VirtualMashupDB, which is the database you just created.

  3. In the File field on the Choose Data source window, navigate to the directory that contains the spreadsheet and double-click the file.

    For this exercise, select company-data.xls. The file now appears in the Selected Table Source list. You can remove it by highlighting the file name in the Selected Table Source and clicking Remove.


    Note –

    Although you are selecting a file that is located in a local directory, you can also select data located on the web by entering the URL.


  4. Click Next.

  5. In the Enter Table Details window, confirm that the table name, encoding, table type, and resource URL are correct.

    The resource URL is the directory path and file name of the spreadsheet you selected.

  6. Click Next.

  7. In the Choose a Sheet window, select the sheet containing the data to use, and then click Preview to view the information.

    In the example data, the information is in Sheet1.

  8. Click Next.

    Step 5, Choose a (HTML) Table is skipped since no HTML sources were selected. The Import Table MetaData window appears.

  9. In the Import Table MetaData window, define how to parse the file.

    If the data in the preview does not contain column names, and in the example it does not, deselect the First line Contains Field Names? check box. This is also the case for .csv files, and you must deselect this check box prior to setting the FIELD names.


    Note –

    If a data file does not exist, you can select to create one. The available fields on this window vary based on the type of data you are importing into the database.


  10. Click Next.

  11. In the Enter Column Properties window, double-click on a column field to edit it.

    The # symbol represents the numerical order of the columns, and in the example FIELD_1 is the name of the first column. For this example, change the column names to names that more appropriately represent the values. This makes it easier to know what data is in which column when you merge the spreadsheet and the text file fields together. Use the following column names:

    FIELD_1

    PRODUCT_IDENTIFIER

    FIELD_2

    PRODUCT

    FIELD_3

    VENDOR

    FIELD_4

    VENDOR_CODE

  12. To populate the Preview Table Content columns, click Back once, and then click Next.

  13. When you are satisfied that the table represented in the Preview Table Content in the lower portion of the Enter Column Properties window is correct, click Finish.

    A message confirms that you created the table successfully. You are now ready to repeat the process and add the text file to the database.

ProcedureTo Add External Data to the Virtual Database (Delimited Text File)

Before You Begin

Make sure you have the two data sources, a spreadsheet and a delimited text file. For this step, you will add the delimited text file to the table and then fine-tune some of the fields.

  1. From the NetBeans IDE tool bar select Tools->Virtual Database->Add External Tables.

  2. In the Add External Tables window, select the appropriate database from the list of available databases.

    In this exercise, VirtualMashupDB is the database you created.

  3. Click Next.

  4. In the File field on the Choose Data Source window, navigate to the directory that contains the delimited text file and then double-click the file name.

    The file now appears in the Selected Table Source list. You can remove the file by highlighting the file name in the Selected Table Source and clicking Remove.


    Note –

    Although you are selecting a file that is located in a local directory, you can also select data on the web by entering the URL.


  5. Click Next.

  6. In the Enter Table Details window, confirm that the table name, encoding, table type, and resource URL are correct.

    For a delimited text file, which you are using in this example, the table type should be Delimited Flatfile. The resource URL is the directory and file name of the delimited text file you selected.

  7. Click Next.

  8. In the Import Table MetaData window, define how to parse the file.


    Note –

    The available fields on this window vary based on the type of data you are importing into the database.


    • In the example the pipe represents the delimiter, so change the field delimiter from a comma (,) to a pipe (|).

    • If your delimited data does not contain column names, as in our sample file, deselect the First Line Contains Field Names? check box.

    • Review the text file in the Preview panel to ensure the selections are correct.

  9. Click Next.

  10. In the Add External Tables/Enter Column Properties window, edit the Column Definitions if desired.

    In this exercise, if there are columns that are similar between the two data files, the names you use for the related columns in the delimited data file must match the names you used in the spreadsheet. Change the column names to names that more appropriately represent the column values. For this exercise, use the following values:

    FIELD_1

    VENDOR_CODE

    FIELD_2

    VENDOR

    FIELD_3

    VENDOR_ADDRESS

  11. To populate the Preview Table Content columns, click Back once, and then click Next.

  12. Click Finish when ready.

    A message confirms that you created the table successfully. You are now ready to verify that your Data Mashup database has the correct tables.

ProcedureTo Verify the Virtual Database Tables

  1. Under the Services tab expand the Databases node.

  2. Right–click the database you created and select Connect.

    You are now connected to the database you created.

  3. Expand the database, expand the Tables node, right click one of the tables you are using, and select View Data.

    Repeat this step to view the other table. The tables used in this procedure are COMPANY_DATA and SUPPLIER_ADDRESS.

ProcedureTo Create the Data Mashup Project

Once the virtual database is populated, you can create the Data Mashup project. This project will include an EDM collaboration that defines how the data is mashed up into one table.

  1. From the NetBeans IDE tool bar select File->New Project.

    The New Project Wizard appears.

  2. In the Choose Project window, select SOA under Categories, and select Data Mashup Module under Projects.

  3. Click Next.

  4. In the New Business Process Application window, name the Data Mashup project DemoDMProject and enter the location where you want to store the project files.

  5. Click Finish.

    The project is created and appears as an active project under the Projects tab.

ProcedureTo Create the EDM Collaboration

Before You Begin

Before beginning this procedure, you must have the above-mentioned files ready and stored on your computer, and have started the NetBeans IDE.


Note –

To move forward or backward in the procedure, click Next or Back.


  1. In the NetBeans IDE Projects panel, right–click the DemoDMProject project you just created, and select New->EDM.

    The New File Wizard appears.

  2. On the Name and Location window, type a name for the Data Mashup file and click Finish.

    For this exercise, name the file demoDMfile; the program adds the .edm extension.


    Note –

    The Collaborations directory has already been created and is in the path, which is listed in the Created File field.


  3. Click Finish.


    Note –

    You could click Next instead to add the virtual database tables to the collaboration. This will be done in a later step for this exercise.


    The file name you created, the EDM Editor opens in a new blank canvas, and the Tool Operators palette appears.

Next Steps

You are now ready to configure your Data Mashup project. For instructions on how to do this see Configuring Data Mashup Projects Using Joins.