Oracle by Example brandingCreate a JSON Data Miner Workflow

section 0Before You Begin

This 15-minute tutorial shows you how to create a workflow for querying JSON data and persisting the results in a table.

Background

JSON is a popular lightweight data structure commonly used by Big Data. For example, web logs generated in the middle tier web servers are likely in JSON format. NoSQL database vendors have chosen JSON as their primary data representation. Moreover, the JSON format is widely used in the RESTful style Web services responses generated by most popular social media websites like Facebook, Twitter, LinkedIn, etc. This JSON data could potentially contain a wealth of information that is valuable for business use.

Oracle Database provides ability to store and query JSON data. To take advantage of the database JSON support, Oracle Data Miner provides a JSON Query node that allows users to query JSON data as a relational format.

In additional, the current Data Source and Create Table nodes are enhanced to allow users to specify JSON data in the input data source.

Scenario

In this lesson, you create a workflow that imports JSON data by using the JSON Query node. The JSON Query node enables you to selectively query desirable attributes and project the result in relational format. Once the data is in relational format, you can treat it as a normal relational data source and start analyzing and mining it immediately.

In the workflow, you:

  • Identify the JSON data
  • Specify the desirable attributes for query purposes
  • Examine the JSON data using the JSON Query node
  • Build Classification models to provide predictive analysis on the JSON data

The completed workflow looks like this:

Description of completed-workflow.jpg
Description of the illustration completed-workflow.jpg

What Do You Need?


section 1Create a Data Miner Project

  1. Right-click your Data Miner connection and select New Project.
  2. In the Create Project window, enter Big Data Analysis as the project name and then click OK.

section 2Create a Workflow and Add the Data Source

  1. Right-click the Big Data Analysis project and select New Workflow from the menu.
  2. In the Create Workflow window, enter JSON Attribute Analysis as the name and click OK.
  3. In the Components tab, open the Data category.
  4. Drag and drop the Data Source node onto the Workflow pane.

    The Define Data Source wizard opens automatically.

  5. In the Available Tables/View list, scroll down and select the ODMR_SALES_JSON_DATA table.
    Description of define-json-data-source.jpg
    Description of the illustration define-json-data-source.jpg

    Notice that there is only one column (JSON_DATA) in this table, of JSON(CLOB) data type. The JSON prefix indicates that the data stored is in JSON format, and that CLOB is the original data type.

  6. Click Finish to at the data source node to the workflow.
  7. Right-click the data source node and select Run from the menu.

section 3Create a JSON Query

  1. Drag and drop a JSON Query node from the Transforms group to the workflow.
  2. Connect the data source node to the JSON Query node.
  3. Double-click the JSON Query node to open the JSON Query Node Editor window.
  4. Select JSON_DATA as the Column.
  5. Select all of the JSON attributes down to, but not including SALES.
    Description of select-json-attributes-except-sales.jpg
    Description of the illustration select-json-attributes-except-sales.jpg
  6. Select the Aggregate tab. You will define aggregations for QUANTITY_SOLD and AMOUNT_SOLD attributes (within the SALES array):
  7. Click the Preview tab and then the Output Data tab.
    Description of json-query-node-editor-preview.png
    Description of the illustration json-query-node-editor-preview.png

    The Output Data tab shows the actual data in the generated relational output.

  8. Click OK to close the JSON Query Node Editor.
  9. Right-click the JSON Query node and select Run from the menu.

section 4Persist JSON Query Results in a Table

  1. Drag and drop a Create Table node to the workflow.
  2. Connect the JSON Query node to the OUTPUT node.
  3. Double-click OUTPUT node, and change the table name to JSON_OUTPUT. Then click OK.
  4. Rename the OUTPUT node to Table for Performance.
    Description of add-table-node.jpg
    Description of the illustration add-table-node.jpg

next stepNext Tutorial

In the next tutorial, you'll buid the JSON data classification models and analyze the results.