10 Predictive Query Nodes

Predictive Query nodes enable you to score data dynamically without a predefined model. Predictive Queries use in-database scoring technology.

Note:

Predictive Query Nodes require Oracle 12c Release 1 or later.

Scoring using Predictive Query nodes has the following limitations:

The transient models created during the running of Predictive Query node are not available for inspection or fine tuning.
If it is necessary to inspect the model, correlate scoring results with the model, specify special algorithm settings, or run multiple scoring queries that use the same model, then a predefined model must be created.

The output of a Predictive Query is the output of an Apply operation.

There are several Predictive Query nodes:

Anomaly Detection Query
An Anomaly Detection Query node analyses the input for anomalies. That is, it detects unusual cases in data.
Clustering Query
A Clustering Query node returns the clusters in the input.
Feature Extraction Query
A Feature Extraction Query extracts features from the input.
Prediction Query
A Prediction Query node performs classification and regression using the input.

Related Topics

Apply Node Output

Anomaly Detection Query

An Anomaly Detection Query node analyses the input for anomalies. That is, it detects unusual cases in data.

Note:

Predictive Query Nodes require Oracle 12c Release 1 or later.

Anomaly Detection Query can run in parallel.

Create an Anomaly Detection Query Node
You create an Anomaly Detection Query node to build an Anomaly Detection model to analyze and detect anomalous occurrences such as fraud.
Edit an Anomaly Detection Query
In the an Anomaly Detection Query Node dialog box, you can specify or change the characteristics of the models to build.
Anomaly Detection Query Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Anomaly Detection Query Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Related Topics

Create an Anomaly Detection Query Node

You create an Anomaly Detection Query node to build an Anomaly Detection model to analyze and detect anomalous occurrences such as fraud.

To create an Anomaly Detection Query in an existing workflow:

In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
Create the Data Source node containing the input data.
Expand the Predictive Query section in the Components pane.
Drag and drop the node from the Components pane to the Workflow pane.
The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
Connect the Data Source node to the Anomaly Detection Query Node.
Edit the Anomaly Detection Query node.
Run the Predictive Query node and view the data for a Predictive Query node.
To save the results of the query, use a Create Table or View node.

Related Topics

Edit an Anomaly Detection Query

In the an Anomaly Detection Query Node dialog box, you can specify or change the characteristics of the models to build.

To edit an Anomaly Detection Query node:

Double-click the Anomaly Detection Query node or right-click the node and click Edit. The Edit Anomaly Detection Query Node dialog box opens.
In the Anomaly Detection Query Node dialog box, enter the details in the following tabs:
- Anomaly Predictions tab:
  - In the Case ID field, select a case ID from the drop-down list (Optional). It is recommended that you specify a case ID to uniquely define each record. The case ID helps with model repeatability and is consistent with good data mining practices.
  - You can edit the outputs of an Anomaly Prediction (optional). Oracle Data Miner automatically determines the output for the query. You can modify the output.
- Partition tab: You can perform the following tasks.
  - Add one or more Partition attributes. This is optional. Selecting a Partition attribute directs the predictive query to build a virtual model for each unique partition.
  - Select partitions. To select partitions, click the Partition tab and click . Use the Add Partitioning Columns dialog box to select the partitions. You can also specify partitioning expressions.
- Input tab: You can perform the following tasks:
  - Modify Input
  - Add and modify input. Click Input to add and modify input.
  - Remove input.
  - Change the mining type.
- Additional Output tab: You can add output (optional). By default, all target columns, the Case ID column, and partitioning columns are automatically added to additional output. To make changes, click Additional Output.
Click OK.

Edit Anomaly Prediction Output
Oracle Data Miner automatically selects output for query. You can select and edit the parameters of the output functions.
Add Anomaly Function
You can add an anomaly function in the Anomaly Function dialog box.
Edit Anomaly Function Dialog
You can edit the anomaly function in the Edit Anomaly Function dialog box.

Related Topics

Edit Anomaly Prediction Output

Oracle Data Miner automatically selects output for query. You can select and edit the parameters of the output functions.

The default output is listed in the Anomaly Prediction Outputs section in the Anomaly Predictions tab. The defaults are:

Prediction
Prediction Details
Prediction Probability

You can select Prediction Set and edit parameters of the output functions. You can perform the following tasks:

Delete: To delete an output, select the output and click .
Add: To add an output, click . Use the Add Anomaly Function dialog box to select an output.
Edit: To edit an output, either double-click the function or select the function and click . Use Edit Anomaly Function dialog box to make changes.

The output of a Predictive Query is the output of an Apply (Scoring) operation.

Related Topics

Add Anomaly Function

You can add an anomaly function in the Anomaly Function dialog box.

To add anomaly function:

In the Function field, select a function from the drop-down list. The options are:
- Prediction
- Prediction Probability
- Prediction Details
- Prediction Set
To specify a default name instead of using the default name, deselect Auto. This turns off automatic selection.
Click OK.

Edit Anomaly Function Dialog

You can edit the anomaly function in the Edit Anomaly Function dialog box.

To edit an anomaly function:

Select the change that you want to make to the function.
To specify a name instead of using the default name, deselect Auto. This turns off automatic selection.
Click OK.

Anomaly Detection Query Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.

The Anomaly Detection Query Node properties has these sections:

Anomaly Predictions: Displays the predictions produced by the query.
Partition
Additional Output: Displays the output specified.
Cache
Details

Anomaly Detection Query Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view the Anomaly Detection Query node context menu, right-click the node. The following options are available in the context menu:

Connect
Use the Connect option to link nodes in a workflow.
Run
Use the Run option to execute the tasks specified in the nodes that comprise the workflow.
Force Run
Use the Force Run option to rerun one or more nodes that are complete.
Create Schedule
Use the Create Schedule option to define a schedule to run a workflow at a definite time and date.
Edit
Use the Edit option to edit the default settings of a node.
View Data
Use the View Data option to view the data contained in a Data node.
Generate Apply Chain
Use the Generate Apply Chain to create a new node that contains the specification of a node that performs a transformation.
Show Event Log
Use the Show Event Log option to view information about events in the current connection, errors, warnings, and information messages.
Deploy
Use the Deploy option to deploy a node or workflow by creating SQL scripts that perform the tasks specified in the workflow.
Cut
Use the Cut option to remove the selected object, which could be a node or connection.
Copy
Use the Copy option to copy one or more nodes and paste them into the same workflow or a different workflow.
Paste
Use the Paste option to paste the copied object in the workflow.
Extended Paste
Use the Extended Paste option to preserve node and model names while pasting them.
Select All
Use the Select All option to select all the nodes in a workflow.
Performance Settings
Use the Performance Settings option to edit Parallel settings and In-Memory settings of the nodes.
Toolbar Actions
Use the Toolbar Action option to select actions in the toolbar from the context menu.
Show Runtime Errors
Use the Show Runtime Errors to view errors related to node failure during runtime. This option is displayed only when running of the node fails at runtime.
Show Validation Errors
Use the Show Validation Errors option to view validation errors, if any.
Save SQL
Use the Save SQL option to generate SQL script for the selected node.
Validate Parents
Use the Validate Parents option to validate all parent nodes of the current node.
Go to Properties
Use the Go to Properties option to open the Properties pane of the selected node.

Related Topics

Connect

Use the Connect option to link nodes in a workflow.

To connect nodes:

Right-click a node and click Connect. Alternately, go to Diagram and click Connect.
Use the cursor to draw a line from this node to the target node.
Click to establish the connection. Note the following:
- You can create only valid connections.
- You can create only one connection between any two nodes.
- You can remove a connection by pressing the ESC key.

Related Topics

Connect

Run

Use the Run option to execute the tasks specified in the nodes that comprise the workflow.

The Data Miner server runs workflows asynchronously. The client does not have to be connected. You can run one or more nodes in the workflow:

To run one node: Right-click the node and select Run.
To run multiple nodes simultaneously: Select the nodes by holding down the Ctrl key and click each individual node. Then right-click any selected node and select Run.

If a node depends on outputs of one or more parent nodes, then the parent node runs automatically only if the outputs required by the running node are missing.

Force Run

Use the Force Run option to rerun one or more nodes that are complete.

Force Run deletes any existing models before building them once again.

To select more than one node, click the nodes while holding down the Ctrl key.

You can Force Run a node at any location in a workflow. Depending on the location of the node in the workflow, you have the following choices for running the node using Force Run:

Selected Node
Selected Node and Children (available if the node has child nodes)
Child Node Only (available if the node one or more child nodes)
Selected Node and Parents (available if the node has parent nodes)

Create Schedule

Use the Create Schedule option to define a schedule to run a workflow at a definite time and date.

In the Create Schedule dialog box, you can create schedules for your workflows. To create workflow schedule:

Start Date: Select a date to set as the start date of the schedule. Click to select a date.
Repeat: Select any one of the following options:
- None: To schedule the workflow to run only once at the defined time.
- Every Day: To schedule the workflow to run daily at the specified time.
- Every Week: To schedule the workflow to run weekly at the specified time.
- Custom: To customize your workflow schedule, click Custom. This opens the Repeat dialog box, where you can set how frequently the workflow should run.
End Repeat: You can select any one of the following options:
- None: To continue running the workflow every hour.
- After: Select a number by clicking the arrows. This runs the workflow every hour, and would stop after the number of hours you have selected here. For example, if you select 8, then the workflow will run every hour, and after 8 hours, it will stop.
- On Date: Select a particular date by clicking the calendar icon.
Select Use Existing Schedule, and select a schedule from the drop-down list if you want to schedule the workflow as per the selected schedule.
- Click to edit the selected schedule in the Schedule dialog box.
- Click to add a new schedule. You can also edit the selected schedule, and add it here.
- Click to delete the selected schedule.
Click OK.

To save the workflow schedule settings, click . You can provide a name for the schedule in the Save a Schedule dialog box.

Related Topics

Edit

Use the Edit option to edit the default settings of a node.

Nodes have default algorithms and settings. When you edit a node, the default algorithms and settings are modified. You can edit a node in any one of the following ways:

Edit nodes using the Edit dialog box
Edit nodes through Properties UI

Related Topics

View Data

Use the View Data option to view the data contained in a Data node.

The Data nodes are Create Table or View node, Data Source node, Explore Data node, Graph node, SQL Query node, and Update Table node.

Related Topics

Data Source Node Viewer

Generate Apply Chain

Use the Generate Apply Chain to create a new node that contains the specification of a node that performs a transformation.

If you have several transformations performed in sequence, for example, Sample followed by a Custom transform, then you must select Generate Apply Chain for each transformation in the sequence.You must connect the individual nodes and connect them to an appropriate data source.

Generate Apply Chain helps you create a sequence of transformations that you can use to ensure that new data is prepared in the same way as existing data. For example, to ensure that Apply data is prepared in the same way as Build data, use this option.

The Generate Apply Chain option is not valid for all nodes. For example, it does not copy the specification of a Build node.

Show Event Log

Use the Show Event Log option to view information about events in the current connection, errors, warnings, and information messages.

Clicking the Show Event Log option opens the View and Event Log dialog box.

Related Topics

View an Event Log

Deploy

Use the Deploy option to deploy a node or workflow by creating SQL scripts that perform the tasks specified in the workflow.

The scripts generated by Deploy are saved to a directory.

Note:

You must run a node before deploying it.

You can generate a script that replicates the behavior of the entire workflow. Such a script can serve as the basis for application integration or as a light-weight deployment than the alternative of installing the Data Miner repository and workflows in the target and production system.

To deploy a workflow or part of a workflow:

Right-click a node and select Deploy.
Select any one of the deployment options:
- Selected node and dependent nodes
- Selected node, dependent nodes, and child nodes
- Selected node and connected nodes
After selecting the deployment option, the Generate SQL Script wizard opens. In the wizard, enter details for the following:

Related Topics

Cut

Use the Cut option to remove the selected object, which could be a node or connection.

You can also delete objects by selecting them and pressing DELETE on your keyboard.

Copy

Use the Copy option to copy one or more nodes and paste them into the same workflow or a different workflow.

To copy and paste nodes:

Select the nodes to copy. To select several nodes, hold down the Ctrl key when you click the nodes.

The selected node is highlighted. In this example Classification is selected. The other node is not selected.
Right-click and select Copy from the context menu. Alternately, you can press Ctrl+C to copy the selected nodes.

Note:

Copying and pasting nodes do not carry any mining models or results from the original node.

Paste

Use the Paste option to paste the copied object in the workflow.

To paste an object, right-click the workflow and click Paste. Alternately, you can press Ctrl+V.

Note:

Node names and model names are changed to avoid naming collisions. To preserve names, use the option Extended Paste.

Related Topics

Extended Paste

Use the Extended Paste option to preserve node and model names while pasting them.

The default behavior of Paste is to change node names and model names to avoid naming collisions.

To go to the Extended Paste option, right-click the workflow and click Extended Paste. Alternately, you can press Control+Shift+V.

Note:

If model names are not unique, then the models may be overwritten when they are rebuilt.

Related Topics

Copy

Select All

Use the Select All option to select all the nodes in a workflow.

The selected nodes and links are highlighted in a dark blue border.

Performance Settings

Use the Performance Settings option to edit Parallel settings and In-Memory settings of the nodes.

If you click Performance Settings in the context menu, or if you click Performance Options in the workflow toolbar, then the Edit Selected Node Settings dialog box opens. It lists all the nodes that comprise the workflow. To edit the settings in the Edit Selected Node Settings dialog box:

Click Parallel Settings and select:
- Enable: To enable parallel settings in the selected nodes in the workflow.
- Disable: To disable parallel settings in the selected nodes in the workflow.
- All: To turn on parallel processing for all nodes in the workflow.
- None: To turn off parallel processing for all nodes in the workflow.
Click In-Memory Settings and select:
- Enable: To enable In-Memory settings for the selected nodes in the workflow.
- Disable: To disable In-Memory settings for the selected nodes in the workflow.
- All: To turn on In-Memory settings for the selected nodes in the workflow.
- None: To turn off In-Memory settings for all nodes in the workflow
Click to set the Degree of Parallel, and In-Memory settings such as Compression Method, and Priority Levels in the Edit Node Performance Settings dialog box.

If you specify parallel settings for at least one node, then this indication appears in the workflow title bar:

Description of the illustration GUID-EA3E8782-5F34-4714-B123-9999D4144F53-print.eps

Performance Settings is either On for Selected nodes, On (for All nodes), or Off. You can click Performance Options to open the Edit Selected Node Settings dialog box.
Click to edit default the preferences for parallel processing.
- Edit Node Default Settings: You can edit the Parallel Settings and In-Memory settings for the selected node in the Performance Options dialog box. You can access the Performance Options dialog box from the Preferences options in the SQL Developer Tools menu.
- Change Settings to Default

Related Topics

Toolbar Actions

Use the Toolbar Action option to select actions in the toolbar from the context menu.

Current actions are Zoom In and Zoom Out.

Related Topics

Managing Workflows using Workflow Controls

Show Runtime Errors

Use the Show Runtime Errors to view errors related to node failure during runtime. This option is displayed only when running of the node fails at runtime.

The Event Log opens with a list of errors. Select the error to see the exact message and details.

Related Topics

View an Event Log

Show Validation Errors

Use the Show Validation Errors option to view validation errors, if any.

This option is displayed only when there are validation errors. For example, if an Association node is not connected to a Data Source node, then select Show Validation Errors to view the validation error No build data input node connected.

You can also view validation errors by moving the mouse over the node. The errors are displayed in a tool tip.

Save SQL

Use the Save SQL option to generate SQL script for the selected node.

To generate SQL script for the selected node:

Right-click the node and click Save SQL.
Select any one of the options to save the generated SQL script:
- SQL to Clipboard
- SQL to File
- SQL Script to Clipboard
- SQL Script to File
When you save to a file, the system provides a default location. You can browse to change this location. You can also create a folder for scripts.

The saved SQL includes SQL generated by the current node and all of its parent nodes that are data providers. The SQL lineage ends when it encounters a node that represents persisted objects, such as tables or models.

The generated script does not generate all behavior of the node. The script does not create any objects. For example, if you select Save SQLfor a Create Table node, then it does not generate a script to create the table. Instead, it generates a script to query the created table.

Related Topics

Deploy Workflows using Data Query Scripts

Validate Parents

Use the Validate Parents option to validate all parent nodes of the current node.

To validate parent nodes of a node, right-click the node and select Validate Parents.

You can validate parent nodes when the node is in Ready, Complete and Error state. All parent nodes must be in completed state.

Go to Properties

Use the Go to Properties option to open the Properties pane of the selected node.

Related Topics

Managing Workflows and Nodes in the Properties Pane

Clustering Query

A Clustering Query node returns the clusters in the input.

Note:

Predictive Query nodes require Oracle 12c Release 1 or later.

A Clustering Query can run in parallel.

Create a Clustering Query
You create a Clustering Node to build clustering models.
Edit a Clustering Query
In the Edit Clustering Query Node dialog box, you can specify or change the characteristics of the models to build.
Clustering Query Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Clustering Query Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Related Topics

Create a Clustering Query

You create a Clustering Node to build clustering models.

To create a Clustering Query in an existing workflow:

In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
Create a Data Source node containing the input data.
Expand the Predictive Queries section in the Components pane.
Drag and drop the node from the Components pane to the Workflow pane.
The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
Connect the Data Source node to the Clustering Query node.
Edit the Clustering Query node.
Run the Predictive Query node and view the data.
To save the results of the query, use a Create Table or View node.

Related Topics

Edit a Clustering Query

In the Edit Clustering Query Node dialog box, you can specify or change the characteristics of the models to build.

To edit a Clustering Query node:

Double-click the Clustering Query node or right-click the node and click Edit. The Edit Clustering Query Node dialog box opens.
In the Edit Clustering Query Node dialog box, enter the details in the following tabs:
- Cluster Predictions tab: In the Case ID field, select a case ID from the drop-down list. The case ID is optional. It is recommended that you specify a Case ID to uniquely define each record. The case ID helps with model repeatability and is consistent with good data mining practices.
  - In the Case ID field, select a case ID from the drop-down list. The case ID is optional. It is recommended that you specify a case ID to uniquely define each record. The case ID helps with model repeatability and is consistent with good data mining practices.
  - In the Number of Clusters to Compute field, specify the number to compute. Default is 10.
  - Edit Cluster Prediction Outputs. Oracle Data Miner automatically determines the output for the query. You can modify the output.
- Partition tab: You can perform the following tasks:
  - Add one or more Partition attributes. This is optional. Selecting a Partition attribute directs the predictive query to build a virtual model for each unique partition.
  - Select Partitions: To select partitions, click the Partitions tab. Then, click . Then use the Add Partitioning Columns to select the partitions. You can also specify partitioning expressions.
- Input tab: You can modify Input. This is optional. You can add or remove inputs and change the mining types of inputs. Click Input.
- Additional Output tab: You can add outputs (optional). By default, all target columns, the Case ID column, and partitioning columns are automatically added to Additional Output. To make changes, click Additional Output.
Click OK.

Edit Cluster Prediction Outputs
Oracle Data Miner automatically selects output for query. The default outputs are listed in the Cluster Prediction Outputs section in the Cluster Predictions tab.
Add Cluster Function
In the Add Cluster Function dialog box, you can add cluster functions.
Edit Cluster Function
In the Edit Cluster Function dialog box, you can edit the function.

Related Topics

Edit Cluster Prediction Outputs

Oracle Data Miner automatically selects output for query. The default outputs are listed in the Cluster Prediction Outputs section in the Cluster Predictions tab.

The output of a Predictive Query is the output of an Apply (scoring) operation. The defaults are:

Cluster Details
Cluster Distance
Cluster ID
Cluster Probability

You can also select Cluster Set, and edit parameters of the output functions. You can perform the following tasks:

Delete: To delete an output, select the output and click .
Add: To add an output, click . Use Add Cluster Function dialog box to select an output.
Edit: To edit an output, either double-click the function or select the function and click . Use the Edit Cluster Function dialog box to make changes.

Related Topics

Add Cluster Function

In the Add Cluster Function dialog box, you can add cluster functions.

To add Cluster function:

In the Function field, select a function from the drop-down list. The options are:
- Cluster ID
- Cluster Probability
- Cluster Details
- Cluster Distance
- Cluster Set
To specify a default name instead of using the default name, deselect Auto. This turns off automatic selection.
Click OK.

Edit Cluster Function

In the Edit Cluster Function dialog box, you can edit the function.

To edit Cluster function:

Select the change that you want to make to the function.
To specify a name instead of using the default name, deselect Auto. This turns off automatic selection.
Click OK.

Clustering Query Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

The Clustering Query Properties pane has these sections:

Cluster Predictions: Displays the predictions produced by the query.
Partition
Additional Output: Displays the output specified.
Cache
Details