5 Data Nodes

Data nodes specify data for a mining operation, to transform data, or to save data to a table. The input for Oracle Data Mining operations is a table or view in Oracle Database or an Oracle Data Miner node that is part of a data flow.

The Data nodes are available in the Data section in the Components pane. The following Data nodes enable you to specify data in a workflow and to create and modify tables:

Create Table or View Node

The Create Table or View node is a type of node that enables you to save the results in a table in the schema to which you are connected.

For example, use a Create Table or View node to save the results of an Apply node to a table. The Create Table node automatically uses compression when possible. The Create Table node can run in parallel.

The benefits of Create Table or View node are:

  • Data Persistence: A Create Table or View node saves the data that flow into the node as a view or table in the database. If you create a table, then the actual data persists. If you create a view, then the SQL definition (full lineage) persists. Output from this node is the data provided by the view or table.

  • Performance Improvement: If you perform one or more complex transformations, such as Joins and Aggregations, and save the result of the transformations as a table, then the subsequent operations are faster. For example, you can perform an Aggregation and a Join, create a table that contains the results of the transformation, and then use the table as input for building the model. Therefore, you will create a table from the Join node. The Classification models are built against this table.

Working with the Create Table or View Node

Lists the tasks you can perform using a Create Table or View node.

Creating a Create Table or View Node

You create a Create Table or View node to save a data flow to a table or view.

You can connect a Create Table or View node to any node that create a data flow, such as an Apply node. To create a Create Table or View node:

  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
  2. In the Data section, click the Create Table or View icon.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. Right-click the node from which to create the table, and click Connect in the context menu.
  5. Draw a line from the selected node to the Create Table or View node and click again.
  6. You can accept the default settings of the Create Table or View node or edit the default settings. Click Edit in the context menu.
  7. To create the table, right-click the Create Table or View node and select Run from the context menu.

    The table is automatically compressed, if possible.

  8. After running the node, right-click the node and select View Data to view the results.

Create Table Node and Compression

Oracle Data Miner creates tables in the Create Table node, and creates split data sets for testing in the Classification and Regression model build.

When Oracle Data Miner creates a table, and if Oracle Data Miner determines that the data does not have nested columns DM_NESTED_*, then it uses compression to create the table.

Table compression does the following:

  • Saves disk space

  • Reduces memory use in the buffer cache

  • Speeds up the running of queries during reads

For the Create Table node, Oracle Data Miner also verifies that a primary key is not defined and no indexes are defined.

Edit Create Table or View Node

You can modify the operation of the Create Table or View node in the Edit Create Table or View dialog box.

To edit a Create Table or View node:

  1. Double-click the node or right-click and select Edit. The Edit Create Table or View dialog box opens.
  2. In the Edit Create Table or View node dialog box, you can make the following changes:
    • Name: Displays the default table name. You can change the default name of the table or view to any unique and valid name.

    • Type: By default, the object type is Table. You can change the default to View by selecting the appropriate option.

    • Storage Settings: Click to set the data compression method and logging settings when creating a table.

    • Auto Input Column Selection: Deselect the check box to manually select and edit input columns. You can perform the following tasks:

      • Delete columns: Select the column and click delete.

      • Edit columns: Select the column and click edit. The Select Column dialog box opens.

        Note:

        If JSON data is present, then select the column and click edit to specify Data Guide settings in the Edit Data Guide dialog box.

      • Edit the data type entry in the Target Type column for JSON data only. Click the data type in the Target column to select an option from the in-place drop-down list.

      • Specify Key, Index and Alias for each column.

        Note:

        If you create a table that join another object, then adding an index will speed up the join.

    • JSON Settings: Click JSON Settings to specify node settings that determines how Data Guides are generated. This option is applicable only for JSON data.

  3. Click OK.
Edit Storage Settings

In the Edit Storage Settings dialog box, you can set the logging settings and data compression method.

To edit logging settings and data comprehension method:

  • Select Logging On to override any other settings specified as part of Parallel Processing settings.

  • In the Compression Level section, set the data compression method by selecting any one of the following options:

    • None: In this method, the data is not compressed.

    • Low: This method results in the best query performance.

    • Medium: This method optimizes the data for DML operations and compresses the IM column

    • High: This method results in excellent query performance.

    • Higher: This method results in good query performance.

    • Highest: This method results in a fair query performance.

Select Columns

By default, all columns are selected. You must select at least one attribute for the Create Table or View definition to be complete.

You can perform the following tasks:

  • Remove columns: Select the attribute in the Selected Attribute section and move it to the Available Attribute section. Click OK.

  • Add columns: Select the attribute in theAvailable Attribute section and move it to the Selected Attribute section. Click OK.

View Create Table or View Data

You can view the data of a node in the Data Source Node viewer.

After the node runs successfully, right-click the node and select View Data. The data is displayed in the Data Source Node viewer.

Related Topics

Create Table or View Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

You can perform the following tasks:

Related Topics

Connect

Use the Connect option to link nodes in a workflow.

To connect nodes:

  1. Right-click a node and click Connect. Alternately, go to Diagram and click Connect.
  2. Use the cursor to draw a line from this node to the target node.
  3. Click to establish the connection. Note the following:
    • You can create only valid connections.

    • You can create only one connection between any two nodes.

    • You can remove a connection by pressing the ESC key.

Related Topics

Run

Use the Run option to execute the tasks specified in the nodes that comprise the workflow.

The Data Miner server runs workflows asynchronously. The client does not have to be connected. You can run one or more nodes in the workflow:

  • To run one node: Right-click the node and select Run.

  • To run multiple nodes simultaneously: Select the nodes by holding down the Ctrl key and click each individual node. Then right-click any selected node and select Run.

If a node depends on outputs of one or more parent nodes, then the parent node runs automatically only if the outputs required by the running node are missing.

Force Run

Use the Force Run option to rerun one or more nodes that are complete.

Force Run deletes any existing models before building them once again.

To select more than one node, click the nodes while holding down the Ctrl key.

You can Force Run a node at any location in a workflow. Depending on the location of the node in the workflow, you have the following choices for running the node using Force Run:

  • Selected Node

  • Selected Node and Children (available if the node has child nodes)

  • Child Node Only (available if the node one or more child nodes)

  • Selected Node and Parents (available if the node has parent nodes)

Create Schedule

Use the Create Schedule option to define a schedule to run a workflow at a definite time and date.

In the Create Schedule dialog box, you can create schedules for your workflows. To create workflow schedule:
  1. Start Date: Select a date to set as the start date of the schedule. Click calendar to select a date.
  2. Repeat: Select any one of the following options:
    • None: To schedule the workflow to run only once at the defined time.

    • Every Day: To schedule the workflow to run daily at the specified time.

    • Every Week: To schedule the workflow to run weekly at the specified time.

    • Custom: To customize your workflow schedule, click Custom. This opens the Repeat dialog box, where you can set how frequently the workflow should run.

  3. End Repeat: You can select any one of the following options:
    • None: To continue running the workflow every hour.

    • After: Select a number by clicking the arrows. This runs the workflow every hour, and would stop after the number of hours you have selected here. For example, if you select 8, then the workflow will run every hour, and after 8 hours, it will stop.

    • On Date: Select a particular date by clicking the calendar icon.

  4. Select Use Existing Schedule, and select a schedule from the drop-down list if you want to schedule the workflow as per the selected schedule.
    • Click edit to edit the selected schedule in the Schedule dialog box.

    • Click add to add a new schedule. You can also edit the selected schedule, and add it here.

    • Click delete to delete the selected schedule.

  5. Click OK.

To save the workflow schedule settings, click calendar. You can provide a name for the schedule in the Save a Schedule dialog box.

Related Topics

Edit

Use the Edit option to edit the default settings of a node.

Nodes have default algorithms and settings. When you edit a node, the default algorithms and settings are modified. You can edit a node in any one of the following ways:

  • Edit nodes using the Edit dialog box

  • Edit nodes through Properties UI

View Data

Use the View Data option to view the data contained in a Data node.

The Data nodes are Create Table or View node, Data Source node, Explore Data node, Graph node, SQL Query node, and Update Table node.

Related Topics

View Models

Use the View Models option to view the details of the models that are built after running the workflow.

To view models, you must select a model from the list to open the model viewer. A model must be built successfully before it can be viewed.

Generate Apply Chain

Use the Generate Apply Chain to create a new node that contains the specification of a node that performs a transformation.

If you have several transformations performed in sequence, for example, Sample followed by a Custom transform, then you must select Generate Apply Chain for each transformation in the sequence.You must connect the individual nodes and connect them to an appropriate data source.

Generate Apply Chain helps you create a sequence of transformations that you can use to ensure that new data is prepared in the same way as existing data. For example, to ensure that Apply data is prepared in the same way as Build data, use this option.

The Generate Apply Chain option is not valid for all nodes. For example, it does not copy the specification of a Build node.

Refresh Input Data Definition

Use the Refresh Input Data Definition option if you want to update the workflow with new columns, that are either added or removed.

The Refresh Input Data Definition option is equivalent to SELECT* capability in the input source. The option allows you to quickly refresh your workflow definitions to include or exclude columns, as applicable.

Note:

The Refresh Input Data Definition option is available as a context menu option in Data Source nodes and SQL Query nodes.
Show Event Log

Use the Show Event Log option to view information about events in the current connection, errors, warnings, and information messages.

Clicking the Show Event Log option opens the View and Event Log dialog box.

Related Topics

Deploy

Use the Deploy option to deploy a node or workflow by creating SQL scripts that perform the tasks specified in the workflow.

The scripts generated by Deploy are saved to a directory.

Note:

You must run a node before deploying it.

You can generate a script that replicates the behavior of the entire workflow. Such a script can serve as the basis for application integration or as a light-weight deployment than the alternative of installing the Data Miner repository and workflows in the target and production system.

To deploy a workflow or part of a workflow:

  1. Right-click a node and select Deploy.
  2. Select any one of the deployment options:
    • Selected node and dependent nodes

    • Selected node, dependent nodes, and child nodes

    • Selected node and connected nodes

  3. After selecting the deployment option, the Generate SQL Script wizard opens. In the wizard, enter details for the following:
Show Graph

The Show Graph option opens the Graph Node Editor.

All graphs are displayed in the Graph Node Editor.

Related Topics

Cut

Use the Cut option to remove the selected object, which could be a node or connection.

You can also delete objects by selecting them and pressing DELETE on your keyboard.

Copy

Use the Copy option to copy one or more nodes and paste them into the same workflow or a different workflow.

To copy and paste nodes:

  1. Select the nodes to copy. To select several nodes, hold down the Ctrl key when you click the nodes.

    The selected node is highlighted. In this example Classification is selected. The other node is not selected.

    copy
  2. Right-click and select Copy from the context menu. Alternately, you can press Ctrl+C to copy the selected nodes.

Note:

Copying and pasting nodes do not carry any mining models or results from the original node.

Paste

Use the Paste option to paste the copied object in the workflow.

To paste an object, right-click the workflow and click Paste. Alternately, you can press Ctrl+V.

Note:

Node names and model names are changed to avoid naming collisions. To preserve names, use the option Extended Paste.

Related Topics

Extended Paste

Use the Extended Paste option to preserve node and model names while pasting them.

The default behavior of Paste is to change node names and model names to avoid naming collisions.

To go to the Extended Paste option, right-click the workflow and click Extended Paste. Alternately, you can press Control+Shift+V.

Note:

If model names are not unique, then the models may be overwritten when they are rebuilt.

Related Topics

Select All

Use the Select All option to select all the nodes in a workflow.

The selected nodes and links are highlighted in a dark blue border.

Performance Settings

Use the Performance Settings option to edit Parallel settings and In-Memory settings of the nodes.

If you click Performance Settings in the context menu, or if you click Performance Options in the workflow toolbar, then the Edit Selected Node Settings dialog box opens. It lists all the nodes that comprise the workflow. To edit the settings in the Edit Selected Node Settings dialog box:

  • Click Parallel Settings and select:

    • Enable: To enable parallel settings in the selected nodes in the workflow.

    • Disable: To disable parallel settings in the selected nodes in the workflow.

    • All: To turn on parallel processing for all nodes in the workflow.

    • None: To turn off parallel processing for all nodes in the workflow.

  • Click In-Memory Settings and select:

    • Enable: To enable In-Memory settings for the selected nodes in the workflow.

    • Disable: To disable In-Memory settings for the selected nodes in the workflow.

    • All: To turn on In-Memory settings for the selected nodes in the workflow.

    • None: To turn off In-Memory settings for all nodes in the workflow

  • Click The pencil icon that indicated the option to edit to set the Degree of Parallel, and In-Memory settings such as Compression Method, and Priority Levels in the Edit Node Performance Settings dialog box.

    If you specify parallel settings for at least one node, then this indication appears in the workflow title bar:

    Performance Settings is either On for Selected nodes, On (for All nodes), or Off. You can click Performance Options to open the Edit Selected Node Settings dialog box.

  • Click edit to edit default the preferences for parallel processing.

    • Edit Node Default Settings: You can edit the Parallel Settings and In-Memory settings for the selected node in the Performance Options dialog box. You can access the Performance Options dialog box from the Preferences options in the SQL Developer Tools menu.

    • Change Settings to Default

Toolbar Actions

Use the Toolbar Action option to select actions in the toolbar from the context menu.

Current actions are Zoom In and Zoom Out.

Show Runtime Errors

Use the Show Runtime Errors to view errors related to node failure during runtime. This option is displayed only when running of the node fails at runtime.

The Event Log opens with a list of errors. Select the error to see the exact message and details.

Related Topics

Show Validation Errors

Use the Show Validation Errors option to view validation errors, if any.

This option is displayed only when there are validation errors. For example, if an Association node is not connected to a Data Source node, then select Show Validation Errors to view the validation error No build data input node connected.

You can also view validation errors by moving the mouse over the node. The errors are displayed in a tool tip.

Save SQL

Use the Save SQL option to generate SQL script for the selected node.

To generate SQL script for the selected node:

  1. Right-click the node and click Save SQL.
  2. Select any one of the options to save the generated SQL script:
    • SQL to Clipboard

    • SQL to File

    • SQL Script to Clipboard

    • SQL Script to File

    When you save to a file, the system provides a default location. You can browse to change this location. You can also create a folder for scripts.

    The saved SQL includes SQL generated by the current node and all of its parent nodes that are data providers. The SQL lineage ends when it encounters a node that represents persisted objects, such as tables or models.

    The generated script does not generate all behavior of the node. The script does not create any objects. For example, if you select Save SQLfor a Create Table node, then it does not generate a script to create the table. Instead, it generates a script to query the created table.

Validate Parents

Use the Validate Parents option to validate all parent nodes of the current node.

To validate parent nodes of a node, right-click the node and select Validate Parents.

You can validate parent nodes when the node is in Ready, Complete and Error state. All parent nodes must be in completed state.

Compare Test Results

Use the Compare Test Results option to view and compare test results of models that are built successfully.

For Classification and Regression models, this option displays the test results for all successfully built models to allow you to pick the model that best solves the problem.

View Test Results

Use the View Test Results option to view the test results of the selected model. This option is applicable only for Classification and Regression models.

The test results are displayed in the test viewer of the respective models:

Go to Properties

Use the Go to Properties option to open the Properties pane of the selected node.

Navigate

Use the Navigate option to view the links available from the selected node.

Note:

The Navigate option is enabled only if there are links to other nodes.

Navigate displays the collection of links available from this node. Selecting one of the links selects the link and the selected link is highlighted in the workflow. The link itself has context menu options as well so you can right click and continue with the Navigate option. You can also use the arrow keys to progress to the next node.

Create the Table or View Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To manage the Create Table or View node from Properties pane:

  1. Right-click the node and click Go to Properties.
  2. The Properties pane opens in the lower right panel in the SQL Developer window. The name of the table or view identifies the Create Table or View property. The Properties pane consists of the following:
Table

Table displays the name of the Table or View.

You can perform the following tasks:

  • Change the name of the Table or View: If you change the default name of the table, then the name of the node changes to match the name of the table. For example, if you change the default name of the table to PREDICTIONS, then the name of the Create Table or View node also changes to PREDICTIONS.

  • Change the object type from table to view: The default type is Table. To create a view, click View.

Columns

Columns displays the columns of the table.

The default setting enables automatic behavior. If you deselect Auto Input Column Selection, then you can manually select and edit the columns in the table. You can:

  • Delete columns: Select the column to delete and click delete.

  • Edit columns: Select the column and click edit. Edit the required changes in:

    • Select Column dialog box

    • Edit Data Guide dialog box for JSON data

Automatic Behavior

If you select the Auto Input Column Selection option, then columns in the node are added or removed automatically, subject to certain scenarios.

If Auto Input Column Selection is selected, then the possible scenarios are:

  • Scenario 1: When input is connected, all columns in the input node are selected. The node becomes valid. It assumes that at least one column is included in the specification.

  • Scenario 2: When input is disconnected, all columns are automatically removed. The node becomes invalid.

  • Scenario 3: When the input node is edited, the possible edit scenarios are:

    • Add column: The column is added to the Create Table or View node, if it is compatible.

    • Remove column: The column is removed from the Create Table or View node.

    • Edit column: If a column is edited, then any change in the column data type could trigger an invalid state in the node.

Note:

If the Auto Input Column Selection option is deselected, then you must manually add and remove column specifications from the Create Table or View node.

Data Source Node

A Data Source node defines source data for the workflow. For example, a Data Source node specifies the build data for a model.

Any table or view accessible to the user can be selected as the source. This node does not allow any input node connections. Data Source nodes rely on database resources for their definitions. It may be necessary to refresh a node definition if the database resources change. For example, if the resources are deleted or re-created.

Data Source nodes can run in parallel.

Only certain data types are allowed in a Data Source node. Columns with other data types are excluded.

Related Topics

Supported Data Types for Data Source Nodes

Most of the basic Oracle data types are supported by the Data Source node.

Object-based data types can be included, but each object type has to be well understood. Object data types require storage clauses to be defined at appropriate levels with the object hierarchy.

These data types are fully supported:

  • VARCHAR2

  • CHAR

  • FLOAT

  • NUMBER

  • CLOB

  • NESTED_NUMERICALS

  • NESTED_CATEGORICALS

These data types are supported by Oracle Data Mining 12c Release 1 (12.1) and later:

  • BINARY_DOUBLE

  • BINARY_FLOAT

  • DM_NESTED_BINARY_DOUBLES

  • DM_NESTED_BINARY_DOUBLES

  • BLOB, for Text only

The BINARY data types and BLOB are supported by Oracle Data Miner if you are connected to Oracle Data Mining 12c Release 1 and later.

These data types for date and time are partially supported:

  • DATE

  • TIMESTAMP

  • TIMESTAMP_WITH_LOCAL_TIMEZONE

  • TIMESTAMP_WITH_TIMEZONE

  • TIMESTAMP_WITH_LOCAL_TIMEZONE

Oracle Database 12.1.0.2 supports JSON data in the following data types. Oracle Data Miner derives pseudo JSON data types from these data types:

  • VARCHAR2

  • CLOB

  • BLOB

  • RAW

  • NCLOB

  • NVARCHAR2

Oracle Data Miner 17.2 supports the following data types for input as columns in a Data Source node, and as new computed columns within the workflow:
  • RAW

  • ROWID

  • UROWID

  • URITYPE. The URITYPE data type provides many sub type instances, which are also supported by Oracle Data Miner 17.2. They are:
    • HTTPURITYPE

    • DBURITYPE

    • XDBURITYPE

Support for Date and Time Data

Data and time data types cannot be used for other functions in Oracle Data Miner. In particular, date and time data types cannot be the targets of model builds.

The Transform transformation partially supports the data and time data types DATE, TIMESTAMP, TIMESTAMP_WITH_LOCAL_TIMEZONE, and TIMESTAMP_WITH_TIMEZONE, as follows:

  • Attributes with data and time data types can be binned using Equal Width or Custom binning.

  • You can apply Statistic or Value missing values treatments to attributes with the data and time data types.

Attributes with data and time types are analyzed by the Explore Data node.

Working with the Data Source Node

Lists the tasks you can perform using the Data Source nodes.

Create a Data Source Node

You create a Data Source node after creating a workflow.

To create a Data Source node and attach data to it:

  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
  2. In the Data section, click Data Source node icon.
  3. Drag and drop the Data Source node from the Components pane to the Workflow pane. This adds the Data Source node to the workflow. The Define Data Source dialog box opens.
  4. In the Define Data Source dialog box, you can select a table or view. By default, the tables in your schema are listed. You can add tables from other schemas to which you have access, in the Edit Schema List dialog box.
  5. Click Next.
  6. In the Define Data Source - Select Columns dialog box, add or remove attributes to the table.
  7. Click Finish.
  8. In the Select Table window, select the table or view to use. Click OK. The Properties pane displays information about the table or view that you selected. The node can now be run.

    The default name of the node is the name of the table or view that you select. If SH.CUSTOMERS is the name of the table, then the node is named CUSTOMERS.

Related Topics

Edit Schema List

By default, tables and views from the schema to which you are connected, are listed.

To add other schemas:

  • If this is the first time that you are adding schemas, then click Add Schemas.

  • If you have already added schemas, then click Edit Schemas.

The Edit Schema List dialog box opens. Available Schemas is a list of the schemas to which you have access. To display tables and views in one of these schemas, move the name to Selected Schemas.

For example, to display tables in the SH schema, move SH to the Selected Schemas list. Click OK.

You return to the Define Data Source wizard. Select Include Tables from Other Schemas. Tables and views in the added schemas are listed. For example, if you selected SH, then you can see tables, such as SH.CUSTOMERS, in the Available Tables/Views list. You can select these tables as data sources.

Define a Data Source Node

If you import a workflow that uses tables or views that are not available, then you can use the Define a Data Source Node wizard to define a data source to replace the missing one.

The wizard also detects input columns with JSON data types. You can use this interface to:

  • Define table and attributes for a new Data Source node.

  • Edit an existing data source node.

The wizard has two steps:

  1. Select Table: Here, the tables and views to which you have access, are displayed. The schemas are the schemas to which you are connected, along with the schemas added using the Edit Schema List. Select the table or view types. The columns and data are displayed in the lower pane in the following tabs:
    • Columns: Lists the columns of the selected table in a grid. For each column, Data Type, Mining Type, Length, Precision, Scale, and Column ID are displayed.

    • Data: Displays the data in the table arranged according to Column ID.

      • To include all attributes in the table, click Finish.

      • To manually exclude certain attributes in the table, click Next.

      • To change the table selection for an existing node, click Edit Data Source Node.

  2. Select Columns: By default, the Define Data Source wizard includes all columns in the Table or View. You can perform the following tasks:
    • Include a column by moving the attribute from Available Attribute to Selected Attributes section.

      A drop-down list is available for JSON data. If there is an input table with JSON data, then the wizard detects the column as a JSON column and displays it in the Data Type column. If the wizard cannot detect JSON data, then you can manually change the data type for the column by clicking the drop-down list.

    • Specify Data Guide Settings: Applicable only for JSON data type. Select the attribute and click edit to specify Data Guide settings in the Edit Data Guide dialog box.

    • JSON Settings: Click JSON Settings to specify node settings that determines how Data Guides are generated. This option is applicable only for JSON data.

Edit Data Guide

The Edit Data Guide dialog box allows you to specify how data guide should be generated for a selected JSON type column.

The Edit Data Guide dialog box has two tabs:

  • Structure: Displays the JSON data structure for the selected column

  • Data: Displays the content of the imported or generated Data Guide table

You can perform the following tasks:

  • Generate Data Guide: You can generate a new data guide table whenever the node is run or rerun. To generate a data guide, select any of the following options from the Data Guide Generation drop-down list:

    • Default: If the option Generate Data Guide if necessary in the JSON Settings dialog box is selected, then the data guide table is generated whenever the node is run or rerun. Otherwise, no data guide is generated.

    • On: Select this option to generate a new data guide table whenever the node is run or rerun.

    • Off: Select this option to not generate a new data guide table whenever the node is run or rerun.

    • Import from Workflow: Select this option to import a data guide from an existing JSON type column defined in a Data Source node or Create Table node within the same workflow or from a different workflow using the Select Data Guide dialog box.

    • Import from file: Select this option to import a data guide from a CSV file. The process validates the data guide, that is, it verifies the correctness of the column header, JSON path format, JSON types, and so on during an import operation.

  • Remove Data Guide: To delete the current Data Guide table, click delete.

  • Export Data Guide: Select this option to export the current data guide table to a CSV file. If you find the generated data guide does not fully represent the underlying JSON data, then you have the option to export the data guide and add any missing JSON paths. You can then import the data guide back to replace the generated one.

Related Topics

Select Data Guide

The Select Data Guide dialog box enables you to import a data guide table from an existing JSON type column defined in a Data Source node or Create Table node within the same workflow or from a different workflow.

Only completed nodes with generated JSON schema are displayed. To import a data guide table:

  1. In the Show field, select a workflow from the drop-down list.
  2. Select the nodes to import.
  3. Click OK.

JSON Settings

In the JSON Parsing Settings dialog box, you can specify node settings that determine how data guides are generated.

A data guide is used whenever a JSON structure is present in the UI, for example, JSON Query node. Because the data guide table generation could be time consuming, especially for large JSON data, the following settings offers some control on the table generation:

  • Generate Data Guide if necessary: By default, this option is selected. It generates a data guide for the JSON type columns. Deselect this option if JSON data is not used in the product. Therefore, no Data Guide will be generated.

  • Sampling: Defines how many JSON documents stored in a column are to be processed, to generate the data guide table. The JSON document contains the entire content of the JSON column for a given row.

    • Max number of documents: 2000 (Default). Use the arrows to change this setting.

  • Limit Document Values to Process: Defines how many JSON values (Number, String, Boolean) are to be parsed in the document to generate the data guide table.

    • Max number per document: 10,000 (Default). Use the arrows to change this setting.

Edit a Data Source Node

You can modify the operation of the Data Source node in the Edit Data Source Node dialog box.

To edit an existing Data Source node:

  1. Double-click or right-click the node and select Edit. The Edit Data Source Node opens.
  2. In the Edit Data Source Node dialog box, you can change the attributes selected in the current Data Source. You can perform the following tasks:
    • Change attribute selection: To change the attribute selection, move the attributes from the Available Attributes pane to the Selected Attributes pane by using the arrows. For example, to remove ATTRIBUTE1 from the data source, move ATTRIBUTE1 from the Selected Attributes list to the Available Attributes list. After you are done, click OK.

    • Select a different table: To select a different table or view, click Edit. The Define Data Source dialog box opens.

Related Topics

Running a Data Source Node

To run a valid Data Source node, right-click the node and select Run.

The Oracle Data Miner server generates a sample of the selected table or view. The size of the table and type of sampling used is determined by the sample settings of the node.

If the node is complete but is required to provide data to a child node that is being run, then the node is run for validation to ensure that the columns and table still exist. If there are any errors, then the node state is set to Error and affected attributes are set to Invalid.

Data Source Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Right-click the Data Source node. The following options are available in the context-menu:

Select Attributes

The Select Attributes dialog box enables you to move attributes between the Available Attributes list and the Selected Attributes list.

To deselect an attribute, move it to the Available Attributes list.

You can search the Available Attributes list for attributes.

Use the shuttle controls to move attributes between the lists.

When you have finished selecting attributes, click OK.

Data Source Node Viewer

The Data Viewer contains information related to data, graphs, columns and SQL queries contained in the node.

To view data, right-click the node and select View Data from the context menu. The Data Viewer opens.

Note:

You can view data only when the Data Source node is in the Valid state.

The data viewer has these tabs:

Data

The Data tab displays a sample of the data.

The Data Viewer provides a grid display of the rows of data either from the sampling, or pulled through the lineage of nodes back to the source tables.

You can perform the following tasks:

  • Refresh: Click refresh to refresh the data.

  • View: Click View and select either Actual Data or Cached Data. Cached data is available only if you cache data in the Cache section of the Properties pane.

  • Sort: Click Sort to sort data according to applicable criteria. Displays the Select Column to Sort By dialog box.

  • Filter: In the Filter field, enter a WHERE clause to select data.

Select Column to Sort By

This dialog box enables you to:

  • Select multiple columns to sort

  • Determine the column ordering

  • Determine ascending or descending order by column

  • Specify Nulls First so that null values appear before real data values

The sort order is preserved until you clear it.

Column headers are also sort-enabled to provide a temporary override to the sort settings.

Graph

In the Graph tab, you can create graphs based on the numeric data.

Related Topics

Columns

The Column tab is a list of all the columns that are output from the node.

For each column, the Name, Data Type, Mining Type, Length, Precision and Scale (for floating point), and Column ID are displayed.

  • If the node has not run, then the table or view structure provided by the database is displayed.

  • If the node has run successfully, then the structure of the sample table is displayed, based on the sampling defined when the node was specified.

There are several filtering options that limit the columns displayed. The filter settings with the (or)/(and) suffixes allow you to enter multiple strings separated by spaces. For example, if the Name/Data Type/Mining Type(or) is selected, then the filter string A B produces all columns where the name or the data type or the mining type starts with the letter A or B.

SQL

The SQL tab provides the SQL Details text area. The text area displays the SQL code that generates the data provided by the actual view shown in the Data tab.

The SQL can be a stacked expression that includes SQL from parent nodes, depending on what lineage is required to access the actual data.

You can copy the SQL and run it in a suitable SQL interface. Select All (Ctrl+A) and Copy (Ctrl+C) are enabled.

The Search control is a standard search control that highlights matching text and searches forward and backward.

Data Source Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties. The Properties pane for a Data Source node has these sections:

Related Topics

Data

The Data section in Properties displays information related to the data and the data source, which can be a table or view.

The Data section consists of the following:

  • Source Table: Displays the name of the source table or view for the Data Source node. If no source table is associated with the node, then click to the right of Source Table. A list of tables and views that are accessible from the data mining account is displayed. You can select the table or view. You can also use this process to change the table or view.

  • Data: Displays the attributes in a grid. For each attribute, the name, the alias, and the data type are displayed. You can perform the following tasks:

    • Create an alias for an attributed by entering the alias in the appropriate cell.

    • Filter attributes.

    • Delete attributes. Select the attribute and click delete.

    • Edit attributes. Select the attribute and click edit.

    • Select Attributes to include in the Data Source.

    • Refresh the node. Click refresh.

Related Topics

Cache

The Cache section provides the option to generate a cache for output data. You can change this default using the transform preference.

You can perform the following tasks:

  • Generate Cache of Output Data to Optimize Viewing of Results: Select this option to generate a cache. The default setting is to not generate a cache.

    • Sampling Size: You can select caching or override the default settings. Default sampling size is Number of Rows Default Value=2000

Related Topics

Details

The Details section displays the name of the node and any comments about it.

You can change the name and comments in the fields here:

  • Node Name

  • Node Comments

Explore Data Node

The Explore Data node provides the profile of any input data source. Explore Data Statistics can either be based on all data or on a sample of the data.

The Explore Data node enables you to do the following:

  • View common statistics and a histogram for each column. Optionally, you can select a Group By attribute and have multivariate view of histograms generated.

  • Generate an output flow containing the results of all the statistical analysis. You can connect any source of data input to an Explore Data node. For example, you can attach an Explore Data node to an Apply node.

  • Analyze attributes with all supported data types and the following data types for date and time:

    • DATE

    • TIMESTAMP

    • TIMESTAMP_WITH_TIMEZONE

    • TIMESTAMP_WITH_LOCAL_TIMEZONE

  • Run Explore Data node in parallel.

  • Create graphs.

  • Export the statistic generated by the Explore Data node using SQL Developer.

Create an Explore Data Node

You create an Explore Data node and connect it to a data source to analyze the data in the data source. You can connect an Explore Data node to any data source.

To create an Explore Data node:

  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
  2. In the Data section, click Explore Data.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. Right-click the node from which to create the table, and click Connect in the context menu.
  5. Draw a line from the node to analyze to the Explore Data node and click again.
  6. To generate statistics and analyze data, right-click the Explore Data node, and click Run.
  7. After running the node, right-click the node and click View Data. The data is displayed in the Explore Data Node Data Viewer.

By default, all attributes in the data source are analyzed. You can specify specific attributes to analyze.

Edit the Explore Data Node

You can view and edit information related to inputs and statistics in the Edit Explore Data Node dialog box.

You can edit the Explore Data node by either double-clicking the node or by right-clicking the node and selecting Edit.

The Edit Explore Data Node editor has these tabs:

Input (Explore)

In the Input tab, you can specify the attributes to analyze. By default, all attributes in the data source are analyzed.

The default is to select no Group By attribute. Group By displays a sorted list of attributes that are limited to data types that can be binned by Explore Data. Selecting a Group By attribute enables you to analyze data based on the selected attribute. For example, suppose that the data contains AGE and GENDER as attributes. If you select AGE as the Group By attribute, then the histograms for GENDER show the age composition of each value of GENDER.

Change the list of attributes to analyze in either of these ways

  • Right-click the Explore Data node and select Edit. The Select Attributes dialog box opens.

  • Select the Explore Data node. The Explore tab in the Explore Data node Properties pane enables you to change the list of attributes.

Select Attributes

In the Select Attribute dialog box, you can choose the attributes of the data source for which you want to view the statistics. By default, all attributes of the data source are selected.

If you do not want to view statistics for an attribute, then move it from the Selected Attributes list to the Available Attributes list.

You can sort the lists by Name or by Data Type.

When you have finished, click OK.

Statistics (Explore)

The statistics tab contains a list of statistics available. For each statistic, there is a brief definition and an indication of the cost to calculate that statistic.

Click the Statistics tab to specify the statistics to calculate. You can also change statistics in the Statistics section of Explore node Properties pane.

If you change any of the default selections, then click Restore Defaults to change all selections to the default selection.

You can search for an individual statistic by name.

By default, Oracle Data Miner calculates these statistics:

  • Average

  • Distinct percent

  • Maximum

  • Median

  • Minimum

  • Mode sampled

  • Percent NULLs

  • Standard deviation

  • Variance

The default statistics have a low or medium cost to calculate.

Skewness and kurtosis have a high cost to calculate. They are not selected by default. You can select them if necessary.

Calculating mode using a sample is a low cost operation. Calculating the mode using all available data is a very high cost operation. To calculate mode using sample data or all available data, click Mode (Sampled) in the Edit Explore Data Node dialog box.

Mode (Sampled)

The default is to calculate mode using a sample of 2000 records. You can calculate the mode using all available data. This calculation has a very high cost.

To use all available data, click the Available Data option and click OK.

Explore Data Node Viewer

The Explore Data Node viewer displays the statistics and other analyses performed by the Explore Data node.

After the node runs successfully, you can view the data in the Explore Data Node viewer.

To view the data, right-click the node and select View Data. The viewer opens in a new tab.

The viewer displays the information in the following tabs:

Statistics

The Statistics tab displays the statistics calculated by the Explore Data node.

Attributes are listed the Statistics grid. For each attribute, the name, data type, histogram, and summary of the statistics are displayed.

To view a large version of the histogram for an attribute, select the attribute. The histogram is displayed below the grid. If the attribute has null values, then the histogram has a separate bin labeled Null bin. The histogram may also contain an Others bin consisting of values that are not NULL but are not in any other bin.

For more information about the histogram, go to the large histogram and right-click the attribute name. These choices are available:

  • Copy to Clipboard: Copies the histogram to the Microsoft Windows clipboard. You can paste this histogram in to any rich editor, such as a word processor or image editor.

  • Save Image As: Exports the image to a PNG file that can be stored in the file system.

  • View Data: Displays the data used to create the histogram in a pop up window. You can search for attribute name, attribute value, or attribute percent. Click Close when you have finished.

Histograms created by Oracle Data Miner use a sample of the data and can be slightly different each time they are displayed.

For all data types, a histogram is created. The number of distinct values, the percentage of NULL values, and the number of distinct values are displayed.

Additional statistics calculated depend on the data type of the attribute:

  • For character attributes VARCHAR2, the Mode is calculated.

  • For numeric attributes NUMBER or FLOAT,the following are calculated:

    • Average

    • Minimum value

    • Maximum value

    • Standard deviation

    • Skewness (a measure of the asymmetry of the distribution)

    • Kurtosis (a measure of flatness or peakedness of the curve describing a frequency distribution in the region about its mode)

Columns

The Column tab is a list of all the columns that are output from the node.

For each column, the Name, Data Type, Mining Type, Length, Precision and Scale (for floating point), and Column ID are displayed.

  • If the node has not run, then the table or view structure provided by the database is displayed.

  • If the node has run successfully, then the structure of the sample table is displayed, based on the sampling defined when the node was specified.

There are several filtering options that limit the columns displayed. The filter settings with the (or)/(and) suffixes allow you to enter multiple strings separated by spaces. For example, if the Name/Data Type/Mining Type(or) is selected, then the filter string A B produces all columns where the name or the data type or the mining type starts with the letter A or B.

Data

The Data section in Properties displays information related to the data and the data source, which can be a table or view.

The Data section consists of the following:

  • Source Table: Displays the name of the source table or view for the Data Source node. If no source table is associated with the node, then click to the right of Source Table. A list of tables and views that are accessible from the data mining account is displayed. You can select the table or view. You can also use this process to change the table or view.

  • Data: Displays the attributes in a grid. For each attribute, the name, the alias, and the data type are displayed. You can perform the following tasks:

    • Create an alias for an attributed by entering the alias in the appropriate cell.

    • Filter attributes.

    • Delete attributes. Select the attribute and click delete.

    • Edit attributes. Select the attribute and click edit.

    • Select Attributes to include in the Data Source.

    • Refresh the node. Click refresh.

Related Topics

SQL

The SQL tab provides the SQL Details text area. The text area displays the SQL code that generates the data provided by the actual view shown in the Data tab.

The SQL can be a stacked expression that includes SQL from parent nodes, depending on what lineage is required to access the actual data.

You can copy the SQL and run it in a suitable SQL interface. Select All (Ctrl+A) and Copy (Ctrl+C) are enabled.

The Search control is a standard search control that highlights matching text and searches forward and backward.

Export Node Calculations

When an Explore node runs, it writes the statistics that it calculates to a database table. You can export the statistics calculated by the Explore node to a Microsoft Excel spreadsheet.

To export the statistics:

  1. When an Explore node runs, it writes the statistics that it calculates to a database table. The name of the table is in the Output section of the Explore Data node Properties pane. Suppose that the name of the table is OUTPUT_8_3. If the Properties pane of the node is not visible, then right-click the Explore node and select Go to Properties.
  2. In SQL Developer, go to the Connections tab. Expand the connection that you used for data mining.
  3. Select Export in the context menu. The Export Wizard opens.
  4. In the Export Wizard:
    • Deselect Export DDL.

    • In the Export Data section, select the version of Microsoft Excel to which you want to export from the Format drop-down list.

    • Specify a file name. Alternately, you can accept the default.

    • Click Next.

  5. Click Finish. SQL Developer exports the table to the spreadsheet.

The spreadsheet contains the statistics. It contains the names of the histograms generated by the Explore Data node.

To export an individual histogram, right-click the histogram and save the graphic.

Perform Tasks from the Explore Data Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

The context menu for an Explore Data node has these selections:

Explore Data Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.

The Explore Data node Properties pane has these sections:

Input (Properties)

The Input section lists the attributes that are analyzed.

Attributes are listed in a grid. For each attribute, the name and the data type are displayed. To sort the list of attributes by name or data type, click the heading in the grid.

You can specify a Group By attribute. Select an attribute from the list.

You can either choose the default settings or select columns:

  • To select columns, deselect Auto Input Columns Selection.

  • To delete an attribute, select it and click delete.

  • To edit an attribute, click edit.

The Select Attributes dialog box opens.

Related Topics

Statistics

The Statistics section lists the calculated statistics.

Related Topics

Output

The Output section lists the columns in the data source.

The names and data type of each column is listed in a grid. You can search the grid by name (the default) or by data type.

To clear the search, click delete.

Histogram

Use bins to create histograms. The Histogram tab lists the default number of bins.

The default number of bins for the following types of bins:

  • Numerical Bins

  • Categorical Bins

  • Date Bins

  • Null Values

  • Other Values

The default number of bins for all of these bin types is 10. The default specifies a maximum number of bins.

Sample

The data is sampled to support data analysis.

The default is to use a sample. The Sample tab has the following selections:

  • Use All Data: By default, Use All Data is deselected.

  • Sampling Size: The default is Number of Rows with a value of 2000. You can change sampling size to Percent. The default is 60 percent.

Graph Node

A Graph node creates a two-dimensional graph of numeric data.

A Graph node is not a data provider, which means that it cannot be connected to another node.

Note:

You cannot generate code from a Graph node.

You can perform the following tasks:

  • Create and edit one or more graphs of different types such as line, scatter, bar, histogram, and box.

  • Create graphs with actual data and sample data.

  • Run a Graph node in parallel.

Related Topics

Types of Graphs

Graph node enables you to create two-dimensional graphs of numeric data in several ways.

You can create the following types of graphs:

  • Line Plot: Uses a line to connect the data points. Line plots are useful for identifying if two variable are correlated. Oracle Data Miner supports two-dimensional line plots.

  • Bar Plot: Compares values. The height of the bar represents the measured value or frequency.

  • Histogram: Shows frequencies, as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval.

  • Scatter Plot: Display values in two discrete sets. One variable determines the position on the horizontal axis and the other variable determines the position on the vertical axis.

  • Box Plot or Box Graph: Graphically depicts groups of numeric data using the quantiles of the data.

Graph nodes require you to specify one or more axes. Axes must consists of numeric data.

Box Plot or Box Graph

A box plot graphically depicts groups of numeric data using the quantiles of the data.

The bottom and top of the box are the first and third quartiles, and the band inside the box is the second quartile (the median). In the type of box plot created by Data Miner, the whiskers show the minimum and maximum values of all the data.

Supported Data Types for Graph Nodes

Graph nodes support NUMBER, FLOAT, DATE, AND TIMESTAMP data types.

The supported data types:

  • NUMBER

  • FLOAT

  • DATE

  • TIMESTAMP

  • TIMESTAMP_WITH_TIMEZONE

  • TIMESTAMP_WITH_LOCAL_TIMEZONE

Graph Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view the context menu, right-click a Graph node. The context menu provides the following options:

Related Topics

Running Graph Node

Run the Graph node to generate sample data.

To run the Graph node, right-click the Graph node and click Run.If you do not generate sample data, then graphs are created using the data provided to the node.

Note:

If you import workflow that contains Graph nodes, then you must run the Graph node to view the graphs.

Show Graph

The Show Graph option opens the Graph Node Editor.

All graphs are displayed in the Graph Node Editor.

Related Topics

Create a Graph Node

You can create a Graph node to visualize numeric data and relationships between numeric variables.

To create a Graph node:

  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
  2. In the Workflow Editor, expand Data and click Graph.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. Right-click the node from which to create the table, and click Connect in the context menu.
  5. Draw a line to the Graph node and click again.
  6. To create a graph using sample data, then right-click the graph node and click Run.
  7. Double-click the Graph node or right-click the node and select Edit from the context menu.
    • If no graphs are defined, then the New Graph dialog box opens. You can define a graph here.

    • If a graph is defined, then in the Graph Node Editor, you can add new graphs or edit existing graphs.

  8. After running of the graph node is complete, the graph is automatically displayed in the Graph Node Editor. You can edit the graph. You can also define new graphs, modify settings and attributes of graphs, and delete graphs.

Related Topics

New Graph

The New Graph dialog box defines a graph with a default name.

You can change the name of the graph here.

Select the type of graph to create, and follow the steps to define it.

The graph that you define is displayed in a container. After you have created a graph, you can edit the definition. You can also add graphs, delete graphs, view the data used to construct a graph, and save the graph as a graphic.

Line or Scatter

Line charts display continuous data over time that is set against a common scale. Scatter charts are commonly used for displaying and comparing numeric values, such as scientific, statistical, and engineering data.

To create a Line or Scatter:

  1. Click Line to create a Line graph. The Line graph is the default type. To create scatter graph, click Scatter.
  2. For a line graph or a scatter graph, enter the following details:
    • Title: This is the title of the graph. You can use the default name or you can provide a different name.

    • Comment: Type a description of the graph. This is an optional field.

    • For line graph settings, specify the following information:

      • X-Axis: Select an attribute for the x-axis of the graph.

      • Y-Axis: Select an attribute for the y-axis of the graph.

      • Group By: Click this option to select an optional Group By attribute. Use this option to create a series based on the Group By attribute values. Select an attribute for Group By. To specify how the grouping is performed, click Settings.

  3. Click OK.

Bar

A bar graph plots the values of a selected attribute (x-axis) against frequency counts (y-Axis).

To specify a bar graph:

  1. Click Bar.
  2. Specify the following information for Bar Graph settings:
    • Title: This is the name of the graph. You can use the default name or you can enter a different name.

    • Comment: Provide a description for the bar graph. This is an optional field.

    • Specify the following Bar Graph settings:

      • X-Axis: Select one attribute to be plotted along the x-axis of the graph. This attribute is handled using either Top N, or Binned (the default). To specify handling, click Settings.

      • Y-Axis: Select one attribute to be plotted along the y-axis of the graph. Specify the value for frequency count as statistic. For example, Average, Min, Max Median, Count. The statistic that you choose summarizes the values in the bin.

      • Group By: Click this option to select an optional Group By attribute. This attribute groups the values in the bin (stack the bar). Select an attribute for Group By. To specify how the grouping is performed, click Settings.

  3. Click OK.

Related Topics

Histogram

A histogram plots the values of a selected attribute along the x-axis against frequency counts along the y-axis.

To create a histogram:

  1. Click Histogram.
  2. Enter the following information for the histogram:
    • Title: This is the name of the histogram. You can use the default name or you can enter a different name.

    • Comment: Enter a description of the histogram. This is an optional field.

    • Specify the following settings for the Histogram:

      • X-Axis: Select an attribute to plot along the x-axis of the graph. This attribute is handled using either Top N, or Binned (the default). To specify handling, click Settings.

      • Group By: Click Group By to select an optional Group By attribute. This attribute groups values in the bin (stack the bar). To specify how Group By is performed, click Settings. By default, a histogram with a Group By attribute is displayed as a Stacked Bar Graph. Click stack, the Stacked Bar Graph icon, and select dual the Dual Y Bar Graph icon. If you specify an invalid value, then a message appears explaining the problem.

  3. Click OK.

Related Topics

Box

A box plot summarizes binned data of a selected attribute (X axis). It graphically depicts groups of numeric data using the quantiles of the data.

The bottom and top of the box are the first and third quartiles, and the band inside the box is the second quartile (the median). In the type of box plot created by Data Miner, the whiskers show the minimum and maximum values of all the data.

To specify a box plot or box graph:

  1. Click Box.
  2. Enter the following details of the Box Graph:
    • Title: This is the name of the Box Graph. You can use the default name or you can enter a different name.

    • Comment: Enter a description of the Box Graph. This is an optional field.

    • Specify the following settings for the box graph:

      • X-Axis: Select an attribute to be plotted along the x-axis of the graph. This attribute is binned using either Top N, or Binned (the default). To specify handling, click Settings.

      • Group By: Click Group By to select an optional group by attribute. This attribute provides values on the y-axis. To specify how the grouping is performed, click Settings. If you specify an invalid value, a message appears explaining the problem.

  3. Click OK.

Related Topics

Settings (Graph Node)

You can provide settings specific to the data type of the selected attribute in the graph.

Depending on the data type of the attribute selected, one of these dialog boxes is displayed when you click Settings.

Select Values to Display

In the Select Values to Display dialog box, you can provide specific attribute values when defining a graph.

If you specify a categorical attribute for an axis or the Group By attribute, then click Settings to display the Select Values to Display dialog box.

The values of the attribute are listed in the Values column. Select values in one of the following:

  • All

  • None

  • Default

    By default, the most frequent values using Top N is selected.

  • Select specific values by clicking the check box for the values.

You can search for values using the search box.

When you have finished, click OK to return to the definition of the graph.

Axis Treatment

If you specify a numeric attribute for an axis or Group By attribute, then click Settings to display the Axis Treatment Settings dialog box.

The options in the Axis Treatment Settings dialog box are:

  • Raw Values (as is)

  • Binned Automatically: Use Equal width binning for the axis values. The default number of bins is 10. You can change this value. The default is to not show null values. To show null values, click the check box.

Click OK to return to the definition of the graph.

Graph Node Editor

The Graph Node Editor displays all defined graphs. You can modify an existing graph or add graphs to the node.

If at least one graph is defined for a Graph node, then double-click the Graph node to open the Graph Node Editor. If no graphs are defined, then the New Graph dialog box opens.

Each graph is in a container. The graph containers are laid out in a grid.

You can also zoom in to view details of a graph.

These icons are applicable to all graphs in the node:

  • To add a new graph, click The green color plus sign to indicate the option to add. to open the New Graph dialog box.

  • To refresh the display, click refresh objects.

  • To select data used to create the graph, click View. The options are:

    • Actual Data: Displays the default unless you have run the Graph node. In this case, the node creates the graph using whatever data is provided.

    • Sample Data: Available only if you have run the Graph node to generate sample data.

The name of the new or existing graph is displayed at the top of the container for the graph, along with these controls:

  • To adjust the size of the graph (zoom in and out), click magnifying glass indicating zoom in.

  • To edit the current graph, click edit current object. When you edit or add a graph, the results are automatically displayed in the editor.

  • To delete the current graph, click delete current object.

  • To examine specific values of the graph, zoom the graph.

Related Topics

Zoom Graph

In the Graph node editor, you have the option to zoom in on selected values to view details.

To examine details of a graph, zoom in on the selected values. For example, to see more detail for selected x-axis values, draw a selection box with the mouse that encloses the values. The display shows values in the selection box in more detail and the axis expands. You can zoom in multiple times. When you have finished viewing the selected values, click the graph once for each time that you zoomed in to return to the original graph.

Viewing Data used to Create Graph

In the Graph node editor, you have the option to save the graph as a graphic.

To view the data used to create the graph or to save the graph as a graphic:

  1. Right-click the graph.
  2. Select any one of the following options:
    • Copy to Clipboard: Copies the graph to the Microsoft Windows clipboard

    • Save Image As: Saves the graph in a PNG file format

    • View Data: Displays the data used to create the graph

Edit Graph

You can edit a graph to change it current attributes and to specify a different graph type.

When you edit a graph, you can do the following:

  • Change the attributes of the existing graph type. For example, if you edit a line graph, then you can change the x-axis and y-axis or add a Group By attribute. You can change the axis treatment or values to display if the graph uses these items.

  • Specify a different graph type. You can change the type of graph. For example, you can change a line graph to a histogram. To change the type of the graph, click the button at the top of the window and specify the required information:

Bar

A bar graph plots the values of a selected attribute (x-axis) against frequency counts (y-Axis).

To specify a bar graph:

  1. Click Bar.
  2. Specify the following information for Bar Graph settings:
    • Title: This is the name of the graph. You can use the default name or you can enter a different name.

    • Comment: Provide a description for the bar graph. This is an optional field.

    • Specify the following Bar Graph settings:

      • X-Axis: Select one attribute to be plotted along the x-axis of the graph. This attribute is handled using either Top N, or Binned (the default). To specify handling, click Settings.

      • Y-Axis: Select one attribute to be plotted along the y-axis of the graph. Specify the value for frequency count as statistic. For example, Average, Min, Max Median, Count. The statistic that you choose summarizes the values in the bin.

      • Group By: Click this option to select an optional Group By attribute. This attribute groups the values in the bin (stack the bar). Select an attribute for Group By. To specify how the grouping is performed, click Settings.

  3. Click OK.

Related Topics

Box

A box plot summarizes binned data of a selected attribute (X axis). It graphically depicts groups of numeric data using the quantiles of the data.

The bottom and top of the box are the first and third quartiles, and the band inside the box is the second quartile (the median). In the type of box plot created by Data Miner, the whiskers show the minimum and maximum values of all the data.

To specify a box plot or box graph:

  1. Click Box.
  2. Enter the following details of the Box Graph:
    • Title: This is the name of the Box Graph. You can use the default name or you can enter a different name.

    • Comment: Enter a description of the Box Graph. This is an optional field.

    • Specify the following settings for the box graph:

      • X-Axis: Select an attribute to be plotted along the x-axis of the graph. This attribute is binned using either Top N, or Binned (the default). To specify handling, click Settings.

      • Group By: Click Group By to select an optional group by attribute. This attribute provides values on the y-axis. To specify how the grouping is performed, click Settings. If you specify an invalid value, a message appears explaining the problem.

  3. Click OK.

Related Topics

Histogram

A histogram plots the values of a selected attribute along the x-axis against frequency counts along the y-axis.

To create a histogram:

  1. Click Histogram.
  2. Enter the following information for the histogram:
    • Title: This is the name of the histogram. You can use the default name or you can enter a different name.

    • Comment: Enter a description of the histogram. This is an optional field.

    • Specify the following settings for the Histogram:

      • X-Axis: Select an attribute to plot along the x-axis of the graph. This attribute is handled using either Top N, or Binned (the default). To specify handling, click Settings.

      • Group By: Click Group By to select an optional Group By attribute. This attribute groups values in the bin (stack the bar). To specify how Group By is performed, click Settings. By default, a histogram with a Group By attribute is displayed as a Stacked Bar Graph. Click stack, the Stacked Bar Graph icon, and select dual the Dual Y Bar Graph icon. If you specify an invalid value, then a message appears explaining the problem.

  3. Click OK.

Related Topics

Line or Scatter

Line charts display continuous data over time that is set against a common scale. Scatter charts are commonly used for displaying and comparing numeric values, such as scientific, statistical, and engineering data.

To create a Line or Scatter:

  1. Click Line to create a Line graph. The Line graph is the default type. To create scatter graph, click Scatter.
  2. For a line graph or a scatter graph, enter the following details:
    • Title: This is the title of the graph. You can use the default name or you can provide a different name.

    • Comment: Type a description of the graph. This is an optional field.

    • For line graph settings, specify the following information:

      • X-Axis: Select an attribute for the x-axis of the graph.

      • Y-Axis: Select an attribute for the y-axis of the graph.

      • Group By: Click this option to select an optional Group By attribute. Use this option to create a series based on the Group By attribute values. Select an attribute for Group By. To specify how the grouping is performed, click Settings.

  3. Click OK.

Graph Node Properties

Graph node properties is identified by the node name that you are viewing.In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.

The Properties pane of a Graph node consists of these sections:

Related Topics

Cache (Graph Node)

You have the option to generate cache of the output data.

After you generate sample data, the option Generate Cache of Output Data to Optimize Viewing Results is selected.

The default value is 2000 records. You can change this value.

Details

The Details section displays the node name and comments about the node.

You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.

Data

The Data section in Properties displays information related to the data and the data source, which can be a table or view.

The Data section consists of the following:

  • Source Table: Displays the name of the source table or view for the Data Source node. If no source table is associated with the node, then click to the right of Source Table. A list of tables and views that are accessible from the data mining account is displayed. You can select the table or view. You can also use this process to change the table or view.

  • Data: Displays the attributes in a grid. For each attribute, the name, the alias, and the data type are displayed. You can perform the following tasks:

    • Create an alias for an attributed by entering the alias in the appropriate cell.

    • Filter attributes.

    • Delete attributes. Select the attribute and click delete.

    • Edit attributes. Select the attribute and click edit.

    • Select Attributes to include in the Data Source.

    • Refresh the node. Click refresh.

Related Topics

SQL Query Node

SQL Query node writes SQL queries to perform special data preparation. Use the SQL Query node to provide input for a model build.

You can do the following:

  • Manually enter a SQL query with the least amount of constraints possible.

  • Incorporate a broader array of methodologies as part of an Oracle Data Miner workflow. You can insert any SQL query as a source of data or as a transformation by using existing data.

  • Run Oracle R Enterprise scripts that are registered for the database.

  • Run parallel processes or parallel queries.

Input for SQL Query Node

The input for a SQL Query node can be data provider nodes and model provider nodes.

A SQL Query node requires the following input:

  • Zero to many data provider nodes, such as Data Source nodes and Transform nodes.

  • Zero to many input model provider nodes, such as Model Build nodes and Model nodes.

Data provider nodes are used as follows:

  • If there are no data provider nodes, then you define an originating Data Source node using an SQL SELECT statement that is not constrained by any input sources that must be defined within Oracle Data Miner. The statement can contain its own internal table references, such as:

    Select * from a, b where a.id = b.id
    

    When there are no data provider nodes, the source tables or views are hidden from Data Miner. Code generation cannot parameterize such tables in the generated SQL script.

  • If there are one or more data provider nodes, then you can reference each data flow within the expression builder interface. You can continue to expose all data sources within the Oracle Data Miner workflow.

When a model provider node is connected as input, it enables you to see the list of model names contained in the node. This is useful in creating SQL that requires a model name.

Oracle Data Miner includes snippets that help you write SQL queries.

SQL Query Restriction

SQL is limited to a query that returns a flow of data.

A SQL Query node can provide data to any node that requires data, such as a node that builds a model.

Create a SQL Query Node

You create a SQL Query node to write a SQL query.

To create a SQL Query node:

  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
  2. Create zero or more data provider nodes, such as Data Source nodes or Transform nodes.
  3. Create zero or more model provider nodes, such as Model Build nodes or Model nodes.
  4. In the Workflow Editor, expand Data, and click Create SQL Query.
  5. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  6. Right-click the data provider and or model provider nodes. For each node, select Connect from the context menu.
  7. Draw a line to the SQL Query node and click again. Ensure that you connect all the required nodes.
  8. Open SQL Query Node Editor by double-clicking the SQL Query node or by selecting Edit from the context node.
  9. Write the SQL query, and validate or preview it.
  10. Click OK.

Related Topics

SQL Query Node Editor

The SQL Query Node Editor enables you to define and validate a SQL query.

The editor provides specifications that can be dragged and dropped or double-clicked into the query build text area. The tabs on the left of the window provide help in writing a query:

  • Source is a list of input nodes that includes both Data Provider nodes and Model Provider nodes. When you select an item in the Source list, information appears in the Message box.

  • Snippets are standard SQL Developer Snippets (SQL code fragments) arranged by the type of calculation.

    The snippet category Predictive Query helps write queries using the DBMS_PREDICTIVE_ANALYTICS PL/SQL package.

    For information about snippets, see Oracle SQL Developer User's Guide.

    For a brief description of snippet functionality, move the cursor over the name. The information is displayed in a tool tip.

  • PL/SQL Functions is a list of PL/SQL functions by user schema.

  • R Script is a list of registered R Scripts; this tab is displayed if Oracle R Enterprise is installed.

Write SQL in the text area.

Below the text area are:

  • Columns—List the columns and data types.

  • Preview—Displays a small number of the rows returned by the query.

When you click OK or Validate, the following are verified:

  • The query generates at least one column of output.

  • All data types are data types that Oracle Data Miner supports. Most scalar data types are supported. The most common data types that are not supported are user custom object types.

    If unsupported data types are found, then a table is created and an error message is displayed above the column panel. All unacceptable data types in the column list are marked with an invalid icon next to the column name.

  • If no validation errors occur during the parsing of the query, then the Columns tab shows data types of the columns and the Data tab shows a small sample of results.

    If a validation error occurs, then the validation panel is displayed for the Column and Data tabs.

Oracle R Enterprise Script Support

Oracle R Enterprise, a component of the Oracle Advanced Analytics option, makes the open source R statistical programming language and environment ready for the enterprise and big data.

Oracle R Enterprise integrates R with Oracle Database. It is designed for problems that involve large amounts of data.

R users can develop, refine, and deploy R scripts that leverage the parallelism and scalability of the database to perform predictive analytics and data analysis.

The SQL Query node in Oracle Data Miner provides a simplified interface for integrating R scripts that have been registered with the database. This enables R developers to provide useful scripts for analyzing data.

You can run embedded R scripts that use theses interfaces:

  • rqEval

  • rqTableEval

  • rqRowEval

  • rqGroupEval

Oracle R Enterprise Database Roles

The Oracle R Enterprise database roles are added to the OMDRUSER role.

The OMDRUSER role includes both these two roles:

  • RQUSER

  • RQADMIN

If the RQUSER and RQAMIN roles are not available in the database configuration at the time of Oracle Data Miner repository installation, then the roles must be added by the DBA manually to the ODMRUSER role after Oracle R Enterprise is installed.

You must register R scripts before you can use them. You can register scripts using SQL*Plus or SQL Worksheet, using connection with DBA privileges.

Registered R Scripts are listed on the R Scripts tab of the SQL Query node. There are also R code snippets, which are code fragments that help you write scripts. Some snippets are just syntax, and others are examples. You can insert and edit snippets when you are using SQL Worksheet or creating or editing R code using SQL Query node.

SQL Query Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To see the context menu, right-click a Data Source node. The following options are available in the context menu:

SQL Query Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.

SQL Query node Properties contains these sections:

  • SQL: Shows the SQL query. Click edit query to edit the query.

  • Output: Shows the columns and data types in the output of the query.

  • Cache

  • Details

Update Table Node

An Update Table node updates an existing table with selected columns from the data input to the node. Input columns are mapped to the existing table columns.

Update Table node can run in parallel. Update Table nodes rely on database resources for their definitions. It may be necessary to refresh a node definition if the database resources change. For example, if the resources are deleted or re-created.

Input and Output for Update Table Node

The input for an Update Table node is any node that produces a data flow. You can connect only one node to an Update Table node.

The output of an Update Table node is a data flow. You can use the data flow as a data source. To save the output as a table or view, use a Create Table or View node.

Related Topics

Data Types for Update Table Node

Update Table node supports certain data types. It can also support other types, such as B, and support these additional types with manual mapping.

You can manually map columns that do not have exact data type matches but the column data types have reasonable, safe implicit conversion default. For example, you can map BINARY_DOUBLE to NUMBER, or NVARCHAR2 to VARCHAR2. There could be some loss of data for such mappings:

  • Mapping BINARY_DOUBLE or BINARY_FLOAT to NUMBER could result in the loss of precision

  • Mapping NVARCHAR2 and NCHAR to VARCHAR2 can result in the loss of data because NVARCHAR2 and NCHAR are based on a potentially different character set than VARCHAR2. For mappings to work, your database must be set up so that NVARCHAR2 and NCHAR are based on the same character set as VARCHAR2.

Create Update Table Node

You create an Update Table node to update an existing table with data from selected columns in the table. You can connect an Update Table node to any node that creates a data flow, such as an Apply node.

To create an Update Table node:

  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
  2. In Workflow Editor, expand Data, and click Update Table.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. Move the mouse to the node in the workflow that produces the data flow to update. Right-click the node, and select Connect from the context menu.
  5. Draw a line to the Update Table node and click again.
  6. The Edit Update Table Node dialog box opens. You can define the characteristics of the Update Table node.
  7. You can do either of the following:
    • Accept the default settings for the Table Update node.

    • Edit the default settings in Edit Update Table node.

  8. To update the table, right-click the Update Table node, and select Run.
  9. After running of the Update Table node is complete, you can view the results. Right-click the node and select View Data.

Update Table Node Automatic Behavior

The automatic behavior of Update Table node depends on the selection of the Auto Input Column Selection option.

If the Auto Input Column Selection option is selected, then the behavior under the following scenarios are:

  • When input is connected—If the Update Table node has a selected table to update, then any column that matches the existing columns in the table are mapped to that column automatically. The node becomes valid, if at least one column was included in the specification.

  • When input is disconnected—All columns are automatically removed. The node becomes invalid.

  • When the input node is edited:

    • If a column is added to the input node, then the column is added to the Update node, if there is a matching column in the existing table.

    • If a column is removed, then the column is removed from the node.

    • If a column is edited, then there are two possibilities:

      • If the edited column no longer matches an existing column, then it is removed.

      • If the edited column matches an existing column, then it is added.

If Auto Input Column Selection is not selected, then you must manually add and remove column specifications from the Update Table node.

Edit Update Table Node

The Edit Update Table dialog box provides the specification for the Update Table node.

To edit the Update Table node:

  • Name: Displays the name of the table. You can select an existing table by clicking Browse Table and select an existing table. Or you can create a new table by clicking New.

  • Auto Input Columns Selection: This option is selected by default. To select other columns, deselect this option. If you deselect this option, then click edit to select an input column. Use the arrows to move attributes from Available Attributes to Selected Attributes.

  • Drop Existing Rows: If this option is selected, then existing rows in the table are dropped before the table is updated. By default, the option is not selected.

  • Columns are listed in the Data grid.

Create Table (Update Table)

All of the attributes of the attached table are listed.

You can either accept the name for the new table or select a different name.

  • To delete an attribute, click delete.

  • To edit an attribute, click edit.

The Select Attributes dialog box opens.

Select Attributes (Update Table)

All columns are selected by default.

You have the option to select particular attributes. If you do not want to include a column in the data:
  1. Move the attribute from Selected Attributes to Available Attributes.

  2. Click OK.

Edit Columns (Update Table)

The Edit Columns tab displays the columns of the table to be updated. For each column, the Input Name, Target Name and Target (data) type are displayed.

By default, the option Auto Input Columns Selection is selected. If you deselect Auto Input Columns Selection, then you must manually select input columns by clicking edit.

Use the arrows to move the attributes from Available Attributes to Selected Attributes.

Update Table Node Data Viewer

After the node runs successfully, you can view the data in the Update Table Node Data Viewer.

To view the data, right-click the node and select View Data. The data viewer is the same as Data Source Node Viewer.

Related Topics

Update Table Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To see the context menu, right-click a Data Source node. The following options are available in the context menu:

Update Table Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.

Update Table Properties consists of these sections:

Table (Update Table)

The Table tab displays the name of the table to update.

The default is to not drop existing rows. To drop existing rows, select Drop Existing Rows.

Columns (Update Table)

The Columns tab displays the columns of the table that are to be updated. For each column, the Input Name, Target Name and Target (data) type are displayed.

By default, the Auto Input Column Selection is selected.

If you deselect the Auto Input Column Selection then you must manually select input columns by clicking edit. Use the arrows to move the attributes from Available Attributes list to Selected Attributes list.

If data on the server changes, then it may be necessary to refresh the nodes. To refresh, click refresh node with data from server.

Cache

The Cache section provides the option to generate a cache for output data. You can change this default using the transform preference.

You can perform the following tasks:

  • Generate Cache of Output Data to Optimize Viewing of Results: Select this option to generate a cache. The default setting is to not generate a cache.

    • Sampling Size: You can select caching or override the default settings. Default sampling size is Number of Rows Default Value=2000

Related Topics

Details

The Details section displays the name of the node and any comments about it.

You can change the name and comments in the fields here:

  • Node Name

  • Node Comments

Target Values Selection

The Target Values Selection dialog box displays the number of target values selected. The default setting is Automatic.

It uses the top 10 Target Class Values by frequency. You can change the number of target values by changing Frequency Count. You can also select Use Lowest Occurring.

To select custom values:

  1. Select Custom.
  2. Move the values from Available Values to Selected Values.
  3. After you are done, click OK.