The following exercise leads you through the steps of designing a Data Mashup project, and includes the following steps:
Creating a virtual database
Adding a minimum of two types of external data, for example a spreadsheet and a text file
Creating a Data Mashup project
For this exercise you will create a federated join using two data files, a Microsoft Excel spreadsheet (company-data.xls) and a delimited text file (supplier-address.txt). Ensure that you have the data files accurately set up before beginning the procedure.
Perform the following steps in the order given to develop the Data Mashup project:
To Add External Data to the Virtual Database (Excel Spreadsheet)
To Add External Data to the Virtual Database (Delimited Text File)
You can either create the spreadsheet and delimited text file or you can download the files from http://www.esnips.com/web/DataMashupSandraUseCaseFiles. Note that the name of the XLS file that is available to download has been changed from sandra-data.xls to company-data.xls for this exercise.
Create an Excel spreadsheet named company-data.xls that contains the following information:
EQ162 |
Funnel |
Shelly glass and labs |
LC05647 |
EQ763 |
Spirit |
Torry Harris Equipment |
LC23456 |
EQ785 |
Beaker |
Torry Harris Equipment |
LC23456 |
EQ232 |
Wire mesh |
Angels Enterprises |
LC34267 |
EQ892 |
Stirrer |
Angels Enterprises |
LC34267 |
EQ232 |
Wire mesh |
Kittens Inc. |
LC45634 |
EQ763 |
Asbestos sheets |
Labkron Instruments |
LC56473 |
EQ785 |
Beaker |
Labkron Instruments |
LC56473 |
EQ162 |
Funnel |
Labtech Inc. |
LC56743 |
EQ763 |
Asbestos sheets |
Labtech Inc. |
LC56743 |
EQ763 |
Spirit |
Peter Labs and Instruments |
LC63234 |
EQ563 |
Test tube |
Peter Labs and Instruments |
LC63234 |
EQ092 |
Pipette |
Patel Lab Exports |
LC67384 |
EQ243 |
Bunsen burner |
Alaska Lab Equipment |
LC76483 |
EQ892 |
Stirrer |
Alaska Lab Equipment |
LC76483 |
EQ162 |
Funnel |
Safe Labs Inc. |
LC87635 |
EQ563 |
Test tube |
Davidson Equipment Exports |
LC87645 |
EQ092 |
Pipette |
Andrew Instruments |
LC97627 |
EQ243 |
Bunsen burner |
Andrew Instruments |
LC97627 |
EQ162 |
Funnel |
Andrew Instruments |
LC97627 |
Create a delimited text file named supplier-address.txt that contains the following information:
LC76434|Yahoo Labs and Instruments|1285, 1st Avenue 69th and 70th, New York, NY 10021 LC76323|Isabel Instruments|330 SUNRISE HIGHWAY, ROCKVILLE CTR, NY 11570 LC76483|Alaska Lab Equipment|275 SOUTH STREET, OYSTER BAY, NY 11771 LC63234|Peter Labs and Instruments|430 E MAIN ST, BAY SHORE, NY 11706 LC34267|Angels Enterprises|2950 AVENTURA BLVD, AVENTURA, FL 33180 LC45634|Kittens Inc.|South Town Center, 10450 S State St, Sandy, UT 84070 LC87635|Safe Labs Inc.|Guadalupe Center, 333 Montezuma Ave, Santa Fe, NM 87501 LC56473|Labkron Instruments|Casis Village, 2700 Exposition, Austin, TX 78703 LC56743|Labtech Inc.|Midtown, 911 W 38th St, Austin, TX 78705 LC67384|Patel Lab Exports|3 University Dr, Augusta, ME 04330 LC05647|Shelly glass and labs|633 W 5th St Los Angeles, CA 90071 LC87645|Davidson Equipment Exports|South Town Center, 10450 S State St, Sandy, UT 84070 LC23456|Torry Harris Equipment|Fairlawn Drive Up, 5325 Sw 21st St, Topeka, KS 66604 LC97627|Andrew Instruments|Valero 4550, 115 N Greeley Hwy, Cheyenne, WY 82001 |
Before beginning this procedure, you must have the above-mentioned files ready and stored on your computer, and have started the NetBeans IDE.
From the NetBeans IDE tool bar select Tools->Virtual Database->Create Virtual Database.
In the Create Virtual Database window, type a Database Name that represents the project you are creating.
For this exercise, name the database VirtualMashupDB.
After naming the database, click Finish.
Click OK on the configuration message that appears.
You are now ready to start adding external data to the virtual database you just created.
Make sure you have the two data sources, a spreadsheet and a delimited text file. These will be merged into one table. For this step add the spreadsheet to the table, and then fine-tune some of its fields.
From the NetBeans IDE tool bar, select Tools->Virtual Database->Add External Tables.
In the Choose Virtual Database window, select the appropriate database from the list of available databases.
For this exercise, select VirtualMashupDB, which is the database you just created.
In the File field on the Choose Data source window, navigate to the directory that contains the spreadsheet and double-click the file.
For this exercise, select company-data.xls. The file now appears in the Selected Table Source list. You can remove it by highlighting the file name in the Selected Table Source and clicking Remove.
Although you are selecting a file that is located in a local directory, you can also select data located on the web by entering the URL.
Click Next.
In the Enter Table Details window, confirm that the table name, encoding, table type, and resource URL are correct.
The resource URL is the directory path and file name of the spreadsheet you selected.
Click Next.
In the Choose a Sheet window, select the sheet containing the data to use, and then click Preview to view the information.
In the example data, the information is in Sheet1.
Click Next.
Step 5, Choose a (HTML) Table is skipped since no HTML sources were selected. The Import Table MetaData window appears.
In the Import Table MetaData window, define how to parse the file.
If the data in the preview does not contain column names, and in the example it does not, deselect the First line Contains Field Names? check box. This is also the case for .csv files, and you must deselect this check box prior to setting the FIELD names.
If a data file does not exist, you can select to create one. The available fields on this window vary based on the type of data you are importing into the database.
Click Next.
In the Enter Column Properties window, double-click on a column field to edit it.
The # symbol represents the numerical order of the columns, and in the example FIELD_1 is the name of the first column. For this example, change the column names to names that more appropriately represent the values. This makes it easier to know what data is in which column when you merge the spreadsheet and the text file fields together. Use the following column names:
PRODUCT_IDENTIFIER
PRODUCT
VENDOR
VENDOR_CODE
To populate the Preview Table Content columns, click Back once, and then click Next.
When you are satisfied that the table represented in the Preview Table Content in the lower portion of the Enter Column Properties window is correct, click Finish.
A message confirms that you created the table successfully. You are now ready to repeat the process and add the text file to the database.
Make sure you have the two data sources, a spreadsheet and a delimited text file. For this step, you will add the delimited text file to the table and then fine-tune some of the fields.
From the NetBeans IDE tool bar select Tools->Virtual Database->Add External Tables.
In the Add External Tables window, select the appropriate database from the list of available databases.
In this exercise, VirtualMashupDB is the database you created.
Click Next.
In the File field on the Choose Data Source window, navigate to the directory that contains the delimited text file and then double-click the file name.
The file now appears in the Selected Table Source list. You can remove the file by highlighting the file name in the Selected Table Source and clicking Remove.
Although you are selecting a file that is located in a local directory, you can also select data on the web by entering the URL.
Click Next.
In the Enter Table Details window, confirm that the table name, encoding, table type, and resource URL are correct.
For a delimited text file, which you are using in this example, the table type should be Delimited Flatfile. The resource URL is the directory and file name of the delimited text file you selected.
Click Next.
In the Import Table MetaData window, define how to parse the file.
The available fields on this window vary based on the type of data you are importing into the database.
In the example the pipe represents the delimiter, so change the field delimiter from a comma (,) to a pipe (|).
If your delimited data does not contain column names, as in our sample file, deselect the First Line Contains Field Names? check box.
Review the text file in the Preview panel to ensure the selections are correct.
Click Next.
In the Add External Tables/Enter Column Properties window, edit the Column Definitions if desired.
In this exercise, if there are columns that are similar between the two data files, the names you use for the related columns in the delimited data file must match the names you used in the spreadsheet. Change the column names to names that more appropriately represent the column values. For this exercise, use the following values:
VENDOR_CODE
VENDOR
VENDOR_ADDRESS
To populate the Preview Table Content columns, click Back once, and then click Next.
Click Finish when ready.
A message confirms that you created the table successfully. You are now ready to verify that your Data Mashup database has the correct tables.
Under the Services tab expand the Databases node.
Right–click the database you created and select Connect.
You are now connected to the database you created.
Expand the database, expand the Tables node, right click one of the tables you are using, and select View Data.
Repeat this step to view the other table. The tables used in this procedure are COMPANY_DATA and SUPPLIER_ADDRESS.
Once the virtual database is populated, you can create the Data Mashup project. This project will include an EDM collaboration that defines how the data is mashed up into one table.
From the NetBeans IDE tool bar select File->New Project.
The New Project Wizard appears.
In the Choose Project window, select SOA under Categories, and select Data Mashup Module under Projects.
Click Next.
In the New Business Process Application window, name the Data Mashup project DemoDMProject and enter the location where you want to store the project files.
Click Finish.
The project is created and appears as an active project under the Projects tab.
Before beginning this procedure, you must have the above-mentioned files ready and stored on your computer, and have started the NetBeans IDE.
To move forward or backward in the procedure, click Next or Back.
In the NetBeans IDE Projects panel, right–click the DemoDMProject project you just created, and select New->EDM.
The New File Wizard appears.
On the Name and Location window, type a name for the Data Mashup file and click Finish.
For this exercise, name the file demoDMfile; the program adds the .edm extension.
The Collaborations directory has already been created and is in the path, which is listed in the Created File field.
Click Finish.
You could click Next instead to add the virtual database tables to the collaboration. This will be done in a later step for this exercise.
The file name you created, the EDM Editor opens in a new blank canvas, and the Tool Operators palette appears.
You are now ready to configure your Data Mashup project. For instructions on how to do this see Configuring Data Mashup Projects Using Joins.