9 Model Operations

Oracle Data Mining enables you to test Classification and Regression models.

A Test node is one of several ways to test a model. After you build a model, you apply the model to new data using an Apply node. Evaluate and Apply data must be prepared in the same way that build data was prepared.

The nodes related to model operations are:

Related Topics

Apply Node

The Apply node takes a collection of models, both partitioned models and non-partitioned models, and returns a single score. The Apply node produces a query as the result.

The result can further transformed or connected to a Create Table or View node to save the data as a table. To make predictions using a model, you must apply the model to new data. This process is also called scoring the new data.

An Apply node generates the SQL for Scoring using one or more models. The SQL includes pass-through (supplemental) attributes and columns generated using Scoring functions.

Note:

You cannot apply Association or Attribute Importance models.

Apply Preferences

In the Preferences dialog box, you can view and change preferences for Apply operations.

To apply preferences to an Apply Node:

  1. In Tools menu option, click Preferences.
  2. In the Preferences dialog box, click Data Miner. You can view and change preferences for Apply operations. The default preferences for Data Miner are:
    • Automatic Apply Settings

    • Data Columns First

  3. Click OK.

Apply Node Input

Inputs for an Apply node can be Model nodes, Model Build nodes, or any node that generates data, such as Data node.

An Apply node requires the following input:

  • One or more of the following:

    • Model node

    • Model Build node

    You must specify at least one model to apply. You can apply several models at the same time.

  • Any node that generates data as an output such as a Data node, a Transforms node, or an appropriate Text node.

    Only one input node is permitted.

    When you apply a model to new data, the new data must be transformed in the same way as the data used to build the model.

Note:

You cannot apply Association or Attribute Importance models.

Apply Node Output

An Apply node generates a data flow based on the Apply and Output specifications.

You can provide specifications for Apply and Output in different ways such as using the Automatic Settings option, Define Apply Column Wizard and so on.

Creating an Apply Node

You create an Apply node to score data based on the models.

Before creating an Apply node, you must connect a Data node and a Model node or Build node to the Apply node.
To create an Apply node:
  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
  2. Either identify Apply Data or create a Data Source node containing the Apply Data.

    Note:

    The Apply Data must be prepared in the same way as the Build Data.
  3. Create a Model node, a Model Build node (such as a Classification node), or a combination of these nodes. At least one model must be successfully built before it can be applied. You cannot apply Association models.
  4. In the Workflow Editor, expand Evaluate and Apply, and click Apply.
  5. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  6. Link the Data node, Model nodes, and Build nodes to the Apply Node.

Apply and Output Specifications

There are several ways to create Apply and Output specifications for an Apply node.

You can choose to use any one of the following:

Edit Apply Node

In the Edit Apply Node dialog box, you can edit settings for predictions, additional output, and automatic settings.

To edit or view an Apply specification, either double-click the Apply node or right-click the Apply node and select Edit. The Edit Apply Node dialog box has two tabs:

  • Predictions: Defines the Apply Scoring specifications.

    An Apply specification consists of several Output Apply columns. The column names are generated automatically.

    • You can specify names. The names must not be more than 30 characters.

    • You can then select a model from the list of models in all Input nodes and an Apply function. The Apply functions that you can select depend on the selected models.

  • Additional Output: Specifies pass-through columns from the Input node. You can select as many columns as you want. You can specify that these selected columns are displayed before the Apply columns (the default) or after the Apply columns.

    These columns are often used to identify the Apply output. For example, you can use the Case ID column to identify the Apply output.

    The default is to not specify any additional output.

The Default Column Order, at the bottom of the Edit Apply Node dialog box, is Data Columns First in the output. You can change this to Apply Columns First.

Select Order Partitions option if you want predictions for partitioned models. By default, this option is selected, even when there are non-partitioned models as inputs for the Apply node.

Predictions

In the Predictions tab, you can define Apply Scoring specifications.

To define specific Apply settings or to edit the default settings, deselect Automatic Settings. You then add new Apply functions or edit existing ones.

Select a case ID from the Case ID drop-down list, as applicable.

You can edit settings in several ways:

  • Show Partition Columns: Select a column and click partition to view the partition keys.

  • Add a setting: Click add to open the Add Output Apply Column dialog box.

  • Edit an existing setting: Select the setting and click edit. The Edit Output Data Column dialog box opens.

  • Delete a specification: Select it and click delete.

  • Define Apply Columns: Click define. In the Define Apply Columns Wizard, click the Define Apply Columns icon.

Apply Functions

The Apply functions that you can choose depend on the models that you apply.

Note:

Certain Apply functions are available only if you are connected to Oracle Database 12c or later.

The Apply functions, arranged according to Model node are:

  • Anomaly Detection Models

    • Prediction: An automatic setting that returns the best prediction for the model. The data type returned depends on the target value type used during the build of the model. For Regression models, this function returns the expected value. The function returns the lowest cost prediction using the stored cost matrix if a cost matrix exists. If no stored cost matrix exists, then the function returns the highest probability prediction.

    • Prediction Details: Returns prediction details. The return value describes the attributes of the prediction. For Anomaly Detection, the returned details refer to the highest probability class or the specified class value.

      Note:

      Prediction Details requires a connection to Oracle Database 12 c or later.

      The defaults for Predictions Details are:

      • Target Value: Most Likely

      • Sort by Weights: Absolute value

      • Maximum Length of Ranked Attribute List: 5

      Prediction Details output is in XML format (XMLType data type). You must parse the output to find the data that you need.

    • Prediction Probability: An automatic setting that returns the probability associated with the best prediction.

    • Prediction Set: Returns a varray of objects containing all classes in a multiclass classification scenario. The object fields are named PREDICTION, PROBABILITY, and COST. The data type of the PREDICTION field depends on the target value type used during the build of the model. The other two fields are both Oracle NUMBER. The elements are returned in the order of best prediction to worst prediction.

  • Clustering Models

    • Cluster Details: The return value describes the attributes of the highest probability cluster or the specified cluster ID. If you specify a value for TopN, then the function returns the N attributes that most influence the cluster assignment (the score). If you do not specify TopN, then the function returns the five most influential attributes.

      Note:

      Cluster Details requires a connection to Oracle Database 12c or later.

      The defaults for Predictions Details are as follows:

      • Cluster ID: Most Likely

      • Sort by Weight: Absolute value

      • Maximum Length of Ranked Attribute List: 5

      The returned attributes are ordered by weight. The weight of an attribute expresses its positive or negative impact on cluster assignment. A positive weight indicates an increased likelihood of assignment. A negative weight indicates a decreased likelihood of assignment.

      Cluster Details output is in XML format (XMLType data type). You must parse the output to find the data that you need.

    • Cluster Distance: Returns a cluster distance for each row in the selection. The cluster distance is the distance between the row and the centroid of the highest probability cluster or the specified cluster ID.

      Note:

      Cluster Distance requires connection to Oracle Database 12c. or later.

      The defaults for Predictions Details are as follows:

      • Cluster ID: Most Likely

    • Cluster ID: An automatic setting that returns the NUMBER of the most probable cluster ID. If the cluster ID has been renamed, then a VARCHAR2 is returned instead.

    • Cluster Probability: An automatic setting that returns a measure of the degree of confidence of membership (NUMBER) of an input row in a cluster associated with the specified model.

    • Cluster Set: Returns an array of objects containing all possible clusters that a given row belongs to given the parameter specifications. Each object in the array is a pair of scalar values containing the cluster ID and the cluster probability. The object fields are named CLUSTER_ID and PROBABILITY, and both are Oracle NUMBER Clustering models only.

  • Feature Extraction Models

    • Feature ID: Returns an Oracle NUMBER that is the identifier of the feature with the highest value for the row.

    • Feature Set: An automatic setting that is similar to Cluster Set.

    • Feature Value: Returns the value of a given feature. If you omit the feature ID argument, then the function returns the highest feature value.

    • Feature Details:  The return value describes the attributes of the highest value feature or the specified feature ID. If you specify a value for TopN, then the function returns the N attributes that most influence the feature value. If you do not specify TopN, then the function returns the 5 most influential attributes.

      Note:

      Feature Extraction Model requires connection to Oracle Database 12c or later.

      The returned attributes are ordered by weight. The weight of an attribute expresses its positive or negative impact on the value of the feature. A positive weight indicates a higher feature value. A negative weight indicates a lower feature value.

      The defaults for Predictions Details are as follows:

      • Feature ID: Most Likely

      • Sort by Weight: Absolute value

      • Maximum Length of Ranked Attribute List: 5

      Feature Details output is in XML format (XMLType data type). You must parse the output to find the data that you need.

  • Classification and Regression Models

    • Prediction: An automatic setting that returns the best prediction for the model. The data type returned depends on the target value type used during the build of the model.

      • For Regression models, this function returns the expected value.

      • For Classification models, the returned details refer to the highest probability class or the specified class value.

        The function returns the lowest cost prediction using the stored cost matrix if a cost matrix exists. If no stored cost matrix exists, then the function returns the highest probability prediction.

    • Prediction Bounds: For generalized linear models, it returns an object with two NUMBER fields LOWER and UPPER. If the GLM was built using Ridge Regression, or if the Covariance Matrix is found to be singular during the build, then this function returns NULL for both fields.

      • For a Regression mining function, the bounds apply to value of the prediction.

      • For a Classification mining function, the bounds apply to the probability value.

    • Prediction Bounds Lower: Same as Prediction Bounds but only returns the lower bounds as a scalar column. Automatic Setting for GLM models.

    • Prediction Bounds Upper: Same as Prediction Bounds but only returns the upper bounds as a scalar column. Automatic Setting for GLM models.

    • Prediction Details: Requires connection to Oracle Database 12 c or later, except for Decision Tree.

      The defaults for Predictions Details for Classification are as follows:

      • Target Value: Most Likely

      • Sort by Weights: Absolute value

      • Maximum Length of Ranked Attribute List: 5

      The defaults for Predictions Details for Regression are as follows:

      • Sort by Weights: Absolute value

      • Maximum Length of Ranked Attribute List: 5

      DT Prediction Details: Returns a string containing model-specific information related to the scoring of the input row. In Oracle Data Miner releases earlier than 4.0, the return value is in the form <Node id = "integer"/>.

      Note:

      DT Prediction Details requires a connection to Oracle Database 11g Release 2 (11.2) or later.

  • Classification

    • Prediction Costs: Returns a measure of cost for a given prediction as a NUMBER. Classification models only. Automatic Setting for DT models.

    • Prediction Probability: Returns the probability associated with the best prediction. The Automatic Setting for is Most Likely.

    • Prediction Set: Returns an array of objects containing all classes in a multiclass classification scenario. The object fields are named PREDICTION, PROBABILITY, and COST. The data type of the PREDICTION field depends on the target value type used during the build of the model. The other two fields are both Oracle NUMBER. The elements are returned in the order of best prediction to worst prediction.

Apply Functions Parameters

The Apply Function parameters that can be specified:

  • Cluster ID: The default is Most Probable. No other parameters are supported.

  • Cluster Probability: The default is Most Probable. You can also select a specific cluster ID or specify NULL or Most Likely to return the bounds for the most likely cluster.

  • Cluster Set: The default is All Clusters You can also specify either or both of the following:

    • TopN: Where N is between one and the number of clusters. The optional TopN argument is a positive integer that restricts the set of features to those that have one of the top N values. If there is a tie at the Nth value, then the database still returns only N values. If you omit this argument, then the function returns all features.

    • Probability Cutoff: It is a number strictly greater than zero and less than or equal to 1. The optional cutoff argument restricts the returned features to only those that have a feature value greater than or equal to the specified cutoff. To filter only by cutoff, specify Null for TopN and the desired cutoff for cutoff.

  • Feature ID: The default is Most Probable. No other values are supported.

  • Feature Set: The default is All Feature IDs. You can also specify either or both of the following:

    • TopN: Where N is between 1 and the number of clusters. The optional TopN argument is a positive integer that restricts the set of features to those that have one of the top N values. If there is a tie at the Nth value, then the database still returns only N values. If you omit this argument, then the function returns all features.

    • Probability Cutoff: It is a number strictly greater than zero and less than or equal to one. The optional cutoff argument restricts the returned features to only those that have a feature value greater than or equal to the specified cutoff. To filter only by cutoff, specify Null for TopN and the desired cutoff.

  • Feature Value: The default is Highest Value. You can also select a specific feature ID value or specify anyone of the following value to return the bounds for the most likely feature:

    • NULL

    • Most Likely

  • Prediction: The default is Best Prediction to consider the cost matrix.

  • Prediction Upper Bounds or Prediction Lower Bounds: The default is Best Prediction with Confidence Level 95%. You can change Confidence Level to any number strictly greater than zero and less than or equal to one. For Classification models only, you can use the Target Value Selection dialog box option to pick a specific target value. You can also specify Null or Most Likely to return the bounds for the most likely target value.

  • Prediction Costs: The default is Best Prediction. Applicable for Classification models only. You can use the Target Value Selection option to pick a specific target value.

  • Prediction Details: Only value is the details for the Best Prediction.

  • Prediction Probability: The default is Best Prediction. Applicable for Classification models only. You can use the Target Value Selection option to pick a specific target value.

  • Prediction Set: The default is All Target Values. You can also specify one or both of the following:

    • bestN: Where N is between one and the number of targets. The optional bestN argument is a positive integer that restricts the returned target classes to the N having the highest probability, or lowest cost if cost matrix clause is specified. If multiple classes are tied in the Nth value, then the database still returns only N values. To filter only by cutoff, specify Null for this parameter.

    • Probability Cutoff: Is a number strictly greater than zero and less than or equal to one. The optional cutoff argument restricts the returned target classes to those with a probability greater than or equal to (or a cost less than or equal to if cost matrix clause is specified) the specified cutoff value. You can filter solely by cutoff by specifying Null for this value.

Default Apply Column Names

The syntax of the default Apply column name:

     "<FUNCTION ABBREVIATION>_<MODEL NAME><SEQUENCE>

SEQUENCE is used only if necessary to avoid a conflict. A sequence number may force the model name to be partially truncated.

FUNCTION ABBREVIATION is one of the following:

  • Cluster Details: CDET

  • Cluster Distance: CDST

  • Cluster ID: CLID

  • Cluster Probability: PROB

  • Cluster Set: CSET

  • Feature Details: FDET

  • Feature ID: FEID

  • Feature Set: FSET

  • Feature Value: FVAL

  • Prediction: PRED

  • Prediction Bounds: PBND

  • Prediction Upper Bounds: PBUP

  • Prediction Lower Bounds: PBLW

  • Prediction Costs: PCST

  • Prediction Details: PDET

  • Prediction Probability: PROB

  • Prediction Set: PSET

Specific target, feature, or cluster default names are abbreviated in one of two ways.

  • The first approach attempts to integrate the value of the target, feature, or cluster into the column name. This approach is used if the maximum value of the target, cluster, or feature does not exceed the remaining character spaces available in the name. The name must be 30 or fewer characters.

  • The second approach substitutes the target, cluster, or feature with a sequence ID. This approach is used if the first approach is not possible.

Add or Edit Apply Output Column

In the Add Apply Output dialog box or the Edit apply Output dialog box, you can manually add or edit a single column Apply definition. You can edit or add Apply definitions one at a time.

Before you add or edit columns, you must deselect Automatic Settings.

You can perform the following tasks:

  • Add an Apply Output column: Click add.

  • Edit an Apply Output column: Click edit. When you edit a column, only the Function selection box and its parameters can be edited.

The following controls are available:

  • Column: Name of column to be generated.

  • Auto :

    • If selected, then you cannot edit the name of the column.

    • If deselected, then auto naming is disabled, and you can rename the column. Column names are validated to ensure that they are unique.

  • Node: List of Model Input nodes connected to node. If there is only one Input node, then it is selected by default.

  • Model: List of models for the selected node. If there is only one model, then it is selected by default.

  • Function: List of model scoring functions for the selected model.

  • Parameters: Displays 0 or more controls necessary to support the parameter requirements of the selected function.

When you have finished defining the output column, click OK.

Add Output Apply Column Dialog

In the Add Output Apply Column dialog box, you can manually add and edit columns in a node that is connected to the Apply node.

By default, name is automatically assigned to the output column.

To add a column:

  1. In the Column field, provide a name.
  2. Deselect Auto.
  3. In the Node field, select one of the node connected to the Apply node. The type of the node that you select determines the choices in the Model and Function fields.
  4. In the Model field, select a model.
  5. In the Function field, select a function.
  6. After you are done, click OK.

Define Apply Columns Wizard

The Define Apply Column wizard is a two step wizard, that allows you to define Apply and Output specifications.

The Define Apply Column wizard comprises the following steps:

Models

In the Models section in the Apply Columns wizard, you can select models and define the output specifications for it.

To select a model:

  1. Select the Models for which you want to define output specification.
  2. Click Next.
Output Specifications

In Output Specifications section, the possible output specifications for the selected model are listed with the default settings selected.

To define output specifications for the selected model:

  1. Select Most Likely: To define the parameters under Most Likely:
    • Select Prediction Details and click Edit to define the prediction details in the model.

    • Select Prediction Bounds and enter the percentage for Confidence. This setting is applicable only for models based on the Generalized Linear Model algorithm.

  2. Select Top N to define the N values in the Define Top N dialog box.
  3. Select Partition Name if you want the name of the partitioned models in the output.
  4. Click Finish to complete the definition.
Define Top N

If you select the Top N option, then you must define the settings related to it.

To define the settings related to Top N option:

  1. Click Use Best N and select the N Value from the drop-down list.
  2. Click Cut Off and select a value from the Prob Value drop-down list.
  3. Click OK.

Additional Output

Additional Output consists of columns that are passed unchanged through the Apply operation.

You can specify pass-through columns from the Input node. You can select as many columns as you want. You can specify that these selected columns are displayed before the Apply columns (the default) or after the Apply columns. These columns are often used to identify the Apply output. For example, you can use the Case ID column to identify the Apply output.

Evaluate and Apply Data

Test and Apply data for a model must be prepared in the same way that Build data for the model was prepared.

To properly prepare Test and Apply data, duplicate the transformation chains of build data for Test and Apply data by copying and pasting build Transforms nodes.

Edit Apply Node

In the Edit Apply Node dialog box, you can specify or change the characteristics of the models to build.

The default value for Default Column Order is Data Columns First, which means that any data columns that you add come first in the output. You can change this to Apply Columns First.

Select Order Partitions option if you want predictions for partitioned models. By default, this option is selected, even when there are non-partitioned models as inputs for the Apply node.

The Edit Apply Node dialog box has the following tabs:

Predictions

In the Predictions tab, you can define Apply Scoring specifications.

To define specific Apply settings or to edit the default settings, deselect Automatic Settings. You then add new Apply functions or edit existing ones.

Select a case ID from the Case ID drop-down list, as applicable.

You can edit settings in several ways:

  • Show Partition Columns: Select a column and click partition to view the partition keys.

  • Add a setting: Click add to open the Add Output Apply Column dialog box.

  • Edit an existing setting: Select the setting and click edit. The Edit Output Data Column dialog box opens.

  • Delete a specification: Select it and click delete.

  • Define Apply Columns: Click define. In the Define Apply Columns Wizard, click the Define Apply Columns icon.

Additional Output

In the Additional Output tab, you can specify pass-through attributes from Data Source nodes.

To add columns:

  1. Click add. The Edit Output Data Column dialog box opens.
  2. In the Edit Apply Node dialog box, the Default Column Order is Data Columns First. You can change this to Apply Columns First.
  3. After you are done, click OK.

Apply Columns

To create an Apply specification, deselect Automatic Settings. By default, Automatic Settings is selected.

You can perform the following tasks:

  • To define Apply columns, click define. The Define Apply Column wizard opens.

  • To add an Output Apply column, click add.

    The Add Output Apply Column dialog box opens.

  • To delete an Output Apply column, click delete.

  • To edit an Output Apply column specification, select the specification. Click edit. The Add or Edit Apply Output Column dialog box opens.

Apply Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.

Apply Node Properties has the following sections:

  • Predictions: Displays the Output Apply columns defined on the Apply Columns. You can edit these details. The option Automatic Selection is selected if selections were not modified.

    For each Output Apply Column, the Name, Function, Parameters, and Node are listed.

  • Additional Output: Lists the Output Data Columns that are passed through. For each column, the Name, Alias (if any), and Data Type are listed.

  • Cache

  • Details: Displays the name of the node and comments.

Related Topics

Apply Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view the Apply node context menu, right-click the node. The following options are available in the context menu:

Connect

Use the Connect option to link nodes in a workflow.

To connect nodes:

  1. Right-click a node and click Connect. Alternately, go to Diagram and click Connect.
  2. Use the cursor to draw a line from this node to the target node.
  3. Click to establish the connection. Note the following:
    • You can create only valid connections.

    • You can create only one connection between any two nodes.

    • You can remove a connection by pressing the ESC key.

Related Topics

Run

Use the Run option to execute the tasks specified in the nodes that comprise the workflow.

The Data Miner server runs workflows asynchronously. The client does not have to be connected. You can run one or more nodes in the workflow:

  • To run one node: Right-click the node and select Run.

  • To run multiple nodes simultaneously: Select the nodes by holding down the Ctrl key and click each individual node. Then right-click any selected node and select Run.

If a node depends on outputs of one or more parent nodes, then the parent node runs automatically only if the outputs required by the running node are missing.

Force Run

Use the Force Run option to rerun one or more nodes that are complete.

Force Run deletes any existing models before building them once again.

To select more than one node, click the nodes while holding down the Ctrl key.

You can Force Run a node at any location in a workflow. Depending on the location of the node in the workflow, you have the following choices for running the node using Force Run:

  • Selected Node

  • Selected Node and Children (available if the node has child nodes)

  • Child Node Only (available if the node one or more child nodes)

  • Selected Node and Parents (available if the node has parent nodes)

Create Schedule

Use the Create Schedule option to define a schedule to run a workflow at a definite time and date.

In the Create Schedule dialog box, you can create schedules for your workflows. To create workflow schedule:
  1. Start Date: Select a date to set as the start date of the schedule. Click calendar to select a date.
  2. Repeat: Select any one of the following options:
    • None: To schedule the workflow to run only once at the defined time.

    • Every Day: To schedule the workflow to run daily at the specified time.

    • Every Week: To schedule the workflow to run weekly at the specified time.

    • Custom: To customize your workflow schedule, click Custom. This opens the Repeat dialog box, where you can set how frequently the workflow should run.

  3. End Repeat: You can select any one of the following options:
    • None: To continue running the workflow every hour.

    • After: Select a number by clicking the arrows. This runs the workflow every hour, and would stop after the number of hours you have selected here. For example, if you select 8, then the workflow will run every hour, and after 8 hours, it will stop.

    • On Date: Select a particular date by clicking the calendar icon.

  4. Select Use Existing Schedule, and select a schedule from the drop-down list if you want to schedule the workflow as per the selected schedule.
    • Click edit to edit the selected schedule in the Schedule dialog box.

    • Click add to add a new schedule. You can also edit the selected schedule, and add it here.

    • Click delete to delete the selected schedule.

  5. Click OK.

To save the workflow schedule settings, click calendar. You can provide a name for the schedule in the Save a Schedule dialog box.

Related Topics

Edit

Use the Edit option to edit the default settings of a node.

Nodes have default algorithms and settings. When you edit a node, the default algorithms and settings are modified. You can edit a node in any one of the following ways:

  • Edit nodes using the Edit dialog box

  • Edit nodes through Properties UI

View Data

Use the View Data option to view the data contained in a Data node.

The Data nodes are Create Table or View node, Data Source node, Explore Data node, Graph node, SQL Query node, and Update Table node.

Related Topics

Generate Apply Chain

Use the Generate Apply Chain to create a new node that contains the specification of a node that performs a transformation.

If you have several transformations performed in sequence, for example, Sample followed by a Custom transform, then you must select Generate Apply Chain for each transformation in the sequence.You must connect the individual nodes and connect them to an appropriate data source.

Generate Apply Chain helps you create a sequence of transformations that you can use to ensure that new data is prepared in the same way as existing data. For example, to ensure that Apply data is prepared in the same way as Build data, use this option.

The Generate Apply Chain option is not valid for all nodes. For example, it does not copy the specification of a Build node.

Show Event Log

Use the Show Event Log option to view information about events in the current connection, errors, warnings, and information messages.

Clicking the Show Event Log option opens the View and Event Log dialog box.

Related Topics

Show Validation Errors

Use the Show Validation Errors option to view validation errors, if any.

This option is displayed only when there are validation errors. For example, if an Association node is not connected to a Data Source node, then select Show Validation Errors to view the validation error No build data input node connected.

You can also view validation errors by moving the mouse over the node. The errors are displayed in a tool tip.

Validate Parents

Use the Validate Parents option to validate all parent nodes of the current node.

To validate parent nodes of a node, right-click the node and select Validate Parents.

You can validate parent nodes when the node is in Ready, Complete and Error state. All parent nodes must be in completed state.

Deploy

Use the Deploy option to deploy a node or workflow by creating SQL scripts that perform the tasks specified in the workflow.

The scripts generated by Deploy are saved to a directory.

Note:

You must run a node before deploying it.

You can generate a script that replicates the behavior of the entire workflow. Such a script can serve as the basis for application integration or as a light-weight deployment than the alternative of installing the Data Miner repository and workflows in the target and production system.

To deploy a workflow or part of a workflow:

  1. Right-click a node and select Deploy.
  2. Select any one of the deployment options:
    • Selected node and dependent nodes

    • Selected node, dependent nodes, and child nodes

    • Selected node and connected nodes

  3. After selecting the deployment option, the Generate SQL Script wizard opens. In the wizard, enter details for the following:

Save SQL

Use the Save SQL option to generate SQL script for the selected node.

To generate SQL script for the selected node:

  1. Right-click the node and click Save SQL.
  2. Select any one of the options to save the generated SQL script:
    • SQL to Clipboard

    • SQL to File

    • SQL Script to Clipboard

    • SQL Script to File

    When you save to a file, the system provides a default location. You can browse to change this location. You can also create a folder for scripts.

    The saved SQL includes SQL generated by the current node and all of its parent nodes that are data providers. The SQL lineage ends when it encounters a node that represents persisted objects, such as tables or models.

    The generated script does not generate all behavior of the node. The script does not create any objects. For example, if you select Save SQLfor a Create Table node, then it does not generate a script to create the table. Instead, it generates a script to query the created table.

Cut

Use the Cut option to remove the selected object, which could be a node or connection.

You can also delete objects by selecting them and pressing DELETE on your keyboard.

Copy

Use the Copy option to copy one or more nodes and paste them into the same workflow or a different workflow.

To copy and paste nodes:

  1. Select the nodes to copy. To select several nodes, hold down the Ctrl key when you click the nodes.

    The selected node is highlighted. In this example Classification is selected. The other node is not selected.

    copy
  2. Right-click and select Copy from the context menu. Alternately, you can press Ctrl+C to copy the selected nodes.

Note:

Copying and pasting nodes do not carry any mining models or results from the original node.

Paste

Use the Paste option to paste the copied object in the workflow.

To paste an object, right-click the workflow and click Paste. Alternately, you can press Ctrl+V.

Note:

Node names and model names are changed to avoid naming collisions. To preserve names, use the option Extended Paste.

Related Topics

Select All

Use the Select All option to select all the nodes in a workflow.

The selected nodes and links are highlighted in a dark blue border.

Performance Settings

Use the Performance Settings option to edit Parallel settings and In-Memory settings of the nodes.

If you click Performance Settings in the context menu, or if you click Performance Options in the workflow toolbar, then the Edit Selected Node Settings dialog box opens. It lists all the nodes that comprise the workflow. To edit the settings in the Edit Selected Node Settings dialog box:

  • Click Parallel Settings and select:

    • Enable: To enable parallel settings in the selected nodes in the workflow.

    • Disable: To disable parallel settings in the selected nodes in the workflow.

    • All: To turn on parallel processing for all nodes in the workflow.

    • None: To turn off parallel processing for all nodes in the workflow.

  • Click In-Memory Settings and select:

    • Enable: To enable In-Memory settings for the selected nodes in the workflow.

    • Disable: To disable In-Memory settings for the selected nodes in the workflow.

    • All: To turn on In-Memory settings for the selected nodes in the workflow.

    • None: To turn off In-Memory settings for all nodes in the workflow

  • Click The pencil icon that indicated the option to edit to set the Degree of Parallel, and In-Memory settings such as Compression Method, and Priority Levels in the Edit Node Performance Settings dialog box.

    If you specify parallel settings for at least one node, then this indication appears in the workflow title bar:

    Performance Settings is either On for Selected nodes, On (for All nodes), or Off. You can click Performance Options to open the Edit Selected Node Settings dialog box.

  • Click edit to edit default the preferences for parallel processing.

    • Edit Node Default Settings: You can edit the Parallel Settings and In-Memory settings for the selected node in the Performance Options dialog box. You can access the Performance Options dialog box from the Preferences options in the SQL Developer Tools menu.

    • Change Settings to Default

Go to Properties

Use the Go to Properties option to open the Properties pane of the selected node.

Navigate

Use the Navigate option to view the links available from the selected node.

Note:

The Navigate option is enabled only if there are links to other nodes.

Navigate displays the collection of links available from this node. Selecting one of the links selects the link and the selected link is highlighted in the workflow. The link itself has context menu options as well so you can right click and continue with the Navigate option. You can also use the arrow keys to progress to the next node.

Apply Data Viewer

The Apply Data Viewer displays the data, columns, and the SQL queries used to generate the Apply output.

The Apply Data viewer opens in a new tab. The viewer has these tabs:

  • Data: Displays rows of data. The default is to view the cache data. You can perform the following tasks:

    • View actual data.

    • Sort data.

    • Filter data with a SQL expression.

    • Refresh the display. To refresh, click refresh.

  • Columns: Lists the columns in the apply.

  • SQL: Lists the SQL queries used to generate the Apply Output.

Feature Compare Node

The Feature Compare node allows you to perform calculations related to semantics in text data, contained in one Data Source node against another Data Source node.

The requirements of a Feature Compare node are:

  • Two input data sources. The data source can be data flow of records, such as connected by a Data Source node or a single record data entered by user inside the node. In case of data entered by users, input data provider is not needed.

  • One input Feature Extraction model provider node, where a model can be selected for calculations related to semantics.

You can do compare features of two data inputs sources by right-clicking the node and selecting Edit.

Create Feature Compare Node

You create a Feature Compare node to perform calculations about text data.

Before creating a Feature Compare node, first, create a workflow. Then, identify or create a Data Source node.
To create a Feature Compare node:
  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
  2. In the Workflow Editor, expand Models, and click Feature Compare.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. Move to the node that provides data for the build. Right-click and click Connect. Drag the line to the Feature Compare node and click again.
  5. You can edit the node. To edit the node, right-click the node and click Edit. The Feature Compare dialog box opens.
  6. The node is ready to build. Right-click the node and click Run.

Related Topics

Feature Compare

In the Feature Compare dialog box, you can specify or change the characteristics of the models to build.

In the Feature Compare dialog box, you can perform the following tasks.

  • In the Feature Compare tab, you can select a Feature Extraction model and specify the data sources to be used for feature comparison. To specify data sources:

    1. In the Model field, select a model from the drop down list. The drop-down list displays all the Feature Extraction models that are connected to the model provider.

    2. Deselect Auto, if you want to enter a custom column name. If Auto is selected, then the Column field automatically displays the column name based on the selected model. The Auto option is for automatic column name generation.

    3. In the Data Input 1 and Data Input 2 fields, select a data provider node from the drop-down list respectively. If you want to enter a custom input, then select User Defined from the drop-down list, and enter the custom entry by clicking the corresponding Data Input cell in the model grid below.

    4. In the Case ID fields, select a supported column for each data provider node. If the Data Input field is set as User Defined, then the Case ID field is disabled.

    5. Click OK.

    The model grid displays the following:

    • Model Attribute: Displays the input attributes from the model signature of the selected model.

    • Data Type: Displays the attribute of the data type.

    • Data Input 1: Displays the matching attribute or user defined data for Data Input 1.

    • Data Input 2: Displays the matching attribute or user defined data for Data Input 1.

  • In the Additional Outputs tab, the selected Case IDs added in the Feature Compare tab are also added here, when Automatic Setting is set to On. You can also add any model attributes as additional columns for output.

Feature Compare Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view the Feature Extraction node context menu, right-click the node. The following options are available in the context menu:

Connect

Use the Connect option to link nodes in a workflow.

To connect nodes:

  1. Right-click a node and click Connect. Alternately, go to Diagram and click Connect.
  2. Use the cursor to draw a line from this node to the target node.
  3. Click to establish the connection. Note the following:
    • You can create only valid connections.

    • You can create only one connection between any two nodes.

    • You can remove a connection by pressing the ESC key.

Related Topics

Run

Use the Run option to execute the tasks specified in the nodes that comprise the workflow.

The Data Miner server runs workflows asynchronously. The client does not have to be connected. You can run one or more nodes in the workflow:

  • To run one node: Right-click the node and select Run.

  • To run multiple nodes simultaneously: Select the nodes by holding down the Ctrl key and click each individual node. Then right-click any selected node and select Run.

If a node depends on outputs of one or more parent nodes, then the parent node runs automatically only if the outputs required by the running node are missing.

Force Run

Use the Force Run option to rerun one or more nodes that are complete.

Force Run deletes any existing models before building them once again.

To select more than one node, click the nodes while holding down the Ctrl key.

You can Force Run a node at any location in a workflow. Depending on the location of the node in the workflow, you have the following choices for running the node using Force Run:

  • Selected Node

  • Selected Node and Children (available if the node has child nodes)

  • Child Node Only (available if the node one or more child nodes)

  • Selected Node and Parents (available if the node has parent nodes)

Create Schedule

Use the Create Schedule option to define a schedule to run a workflow at a definite time and date.

In the Create Schedule dialog box, you can create schedules for your workflows. To create workflow schedule:
  1. Start Date: Select a date to set as the start date of the schedule. Click calendar to select a date.
  2. Repeat: Select any one of the following options:
    • None: To schedule the workflow to run only once at the defined time.

    • Every Day: To schedule the workflow to run daily at the specified time.

    • Every Week: To schedule the workflow to run weekly at the specified time.

    • Custom: To customize your workflow schedule, click Custom. This opens the Repeat dialog box, where you can set how frequently the workflow should run.

  3. End Repeat: You can select any one of the following options:
    • None: To continue running the workflow every hour.

    • After: Select a number by clicking the arrows. This runs the workflow every hour, and would stop after the number of hours you have selected here. For example, if you select 8, then the workflow will run every hour, and after 8 hours, it will stop.

    • On Date: Select a particular date by clicking the calendar icon.

  4. Select Use Existing Schedule, and select a schedule from the drop-down list if you want to schedule the workflow as per the selected schedule.
    • Click edit to edit the selected schedule in the Schedule dialog box.

    • Click add to add a new schedule. You can also edit the selected schedule, and add it here.

    • Click delete to delete the selected schedule.

  5. Click OK.

To save the workflow schedule settings, click calendar. You can provide a name for the schedule in the Save a Schedule dialog box.

Related Topics

Edit

Use the Edit option to edit the default settings of a node.

Nodes have default algorithms and settings. When you edit a node, the default algorithms and settings are modified. You can edit a node in any one of the following ways:

  • Edit nodes using the Edit dialog box

  • Edit nodes through Properties UI

View Data

Use the View Data option to view the data contained in a Data node.

The Data nodes are Create Table or View node, Data Source node, Explore Data node, Graph node, SQL Query node, and Update Table node.

Related Topics

Generate Apply Chain

Use the Generate Apply Chain to create a new node that contains the specification of a node that performs a transformation.

If you have several transformations performed in sequence, for example, Sample followed by a Custom transform, then you must select Generate Apply Chain for each transformation in the sequence.You must connect the individual nodes and connect them to an appropriate data source.

Generate Apply Chain helps you create a sequence of transformations that you can use to ensure that new data is prepared in the same way as existing data. For example, to ensure that Apply data is prepared in the same way as Build data, use this option.

The Generate Apply Chain option is not valid for all nodes. For example, it does not copy the specification of a Build node.

Show Event Log

Use the Show Event Log option to view information about events in the current connection, errors, warnings, and information messages.

Clicking the Show Event Log option opens the View and Event Log dialog box.

Related Topics

Show Validation Errors

Use the Show Validation Errors option to view validation errors, if any.

This option is displayed only when there are validation errors. For example, if an Association node is not connected to a Data Source node, then select Show Validation Errors to view the validation error No build data input node connected.

You can also view validation errors by moving the mouse over the node. The errors are displayed in a tool tip.

Validate Parents

Use the Validate Parents option to validate all parent nodes of the current node.

To validate parent nodes of a node, right-click the node and select Validate Parents.

You can validate parent nodes when the node is in Ready, Complete and Error state. All parent nodes must be in completed state.

Deploy

Use the Deploy option to deploy a node or workflow by creating SQL scripts that perform the tasks specified in the workflow.

The scripts generated by Deploy are saved to a directory.

Note:

You must run a node before deploying it.

You can generate a script that replicates the behavior of the entire workflow. Such a script can serve as the basis for application integration or as a light-weight deployment than the alternative of installing the Data Miner repository and workflows in the target and production system.

To deploy a workflow or part of a workflow:

  1. Right-click a node and select Deploy.
  2. Select any one of the deployment options:
    • Selected node and dependent nodes

    • Selected node, dependent nodes, and child nodes

    • Selected node and connected nodes

  3. After selecting the deployment option, the Generate SQL Script wizard opens. In the wizard, enter details for the following:

Save SQL

Use the Save SQL option to generate SQL script for the selected node.

To generate SQL script for the selected node:

  1. Right-click the node and click Save SQL.
  2. Select any one of the options to save the generated SQL script:
    • SQL to Clipboard

    • SQL to File

    • SQL Script to Clipboard

    • SQL Script to File

    When you save to a file, the system provides a default location. You can browse to change this location. You can also create a folder for scripts.

    The saved SQL includes SQL generated by the current node and all of its parent nodes that are data providers. The SQL lineage ends when it encounters a node that represents persisted objects, such as tables or models.

    The generated script does not generate all behavior of the node. The script does not create any objects. For example, if you select Save SQLfor a Create Table node, then it does not generate a script to create the table. Instead, it generates a script to query the created table.

Cut

Use the Cut option to remove the selected object, which could be a node or connection.

You can also delete objects by selecting them and pressing DELETE on your keyboard.

Copy

Use the Copy option to copy one or more nodes and paste them into the same workflow or a different workflow.

To copy and paste nodes:

  1. Select the nodes to copy. To select several nodes, hold down the Ctrl key when you click the nodes.

    The selected node is highlighted. In this example Classification is selected. The other node is not selected.

    copy
  2. Right-click and select Copy from the context menu. Alternately, you can press Ctrl+C to copy the selected nodes.

Note:

Copying and pasting nodes do not carry any mining models or results from the original node.

Paste

Use the Paste option to paste the copied object in the workflow.

To paste an object, right-click the workflow and click Paste. Alternately, you can press Ctrl+V.

Note:

Node names and model names are changed to avoid naming collisions. To preserve names, use the option Extended Paste.

Related Topics

Select All

Use the Select All option to select all the nodes in a workflow.

The selected nodes and links are highlighted in a dark blue border.

Performance Settings

Use the Performance Settings option to edit Parallel settings and In-Memory settings of the nodes.

If you click Performance Settings in the context menu, or if you click Performance Options in the workflow toolbar, then the Edit Selected Node Settings dialog box opens. It lists all the nodes that comprise the workflow. To edit the settings in the Edit Selected Node Settings dialog box:

  • Click Parallel Settings and select:

    • Enable: To enable parallel settings in the selected nodes in the workflow.

    • Disable: To disable parallel settings in the selected nodes in the workflow.

    • All: To turn on parallel processing for all nodes in the workflow.

    • None: To turn off parallel processing for all nodes in the workflow.

  • Click In-Memory Settings and select:

    • Enable: To enable In-Memory settings for the selected nodes in the workflow.

    • Disable: To disable In-Memory settings for the selected nodes in the workflow.

    • All: To turn on In-Memory settings for the selected nodes in the workflow.

    • None: To turn off In-Memory settings for all nodes in the workflow

  • Click The pencil icon that indicated the option to edit to set the Degree of Parallel, and In-Memory settings such as Compression Method, and Priority Levels in the Edit Node Performance Settings dialog box.

    If you specify parallel settings for at least one node, then this indication appears in the workflow title bar:

    Performance Settings is either On for Selected nodes, On (for All nodes), or Off. You can click Performance Options to open the Edit Selected Node Settings dialog box.

  • Click edit to edit default the preferences for parallel processing.

    • Edit Node Default Settings: You can edit the Parallel Settings and In-Memory settings for the selected node in the Performance Options dialog box. You can access the Performance Options dialog box from the Preferences options in the SQL Developer Tools menu.

    • Change Settings to Default

Go to Properties

Use the Go to Properties option to open the Properties pane of the selected node.

Navigate

Use the Navigate option to view the links available from the selected node.

Note:

The Navigate option is enabled only if there are links to other nodes.

Navigate displays the collection of links available from this node. Selecting one of the links selects the link and the selected link is highlighted in the workflow. The link itself has context menu options as well so you can right click and continue with the Navigate option. You can also use the arrow keys to progress to the next node.

Test Node

Oracle Data Mining enables you to test Classification and Regression models. You cannot test other kinds of models.

A Test node can test several models using the same test set. If the option Automatic Settings is set to ON, then the Test node specification is generated when you connect the input nodes.

A Test node can run in parallel.

Note:

All models tested in a node must be either Classification or Regression Models. You cannot test both kinds of models in the same test node.

Support for Testing Classification and Regression Models

Oracle Data Miner supports testing of a Classification or Regression models.

Oracle Data Miner supports testing of a Classification or Regression models in the following ways:

  • Test the model as part of the Build node using any one of the following ways:

    • Split the Build data into build and test subsets.

    • Use all of the Build data as test data.

    • Connect a second Data Source node, the test Data Source node to the Build node

  • Test the model in a Test node. In this case, the test data can be any table that is compatible with the Build data.

  • After you have tested a Classification Model, you can tune it.

    Note:

    You cannot tune Regression models.

Test Node Input

The input for a Test node can be can be a Model node, a Classification node, or a Regression node.

A Test node has the following input:

  • At least one node that identifies one or more models. The nodes can be a Model node, a Classification node, or a Regression node. A Model node must contain either Classification or Regression model, but not both.

  • Any node that generates data as an output such as a Data node, a Transform node, or an appropriate Text node. This node contains the test data.

  • It is recommended that a case ID is specified. If you do not specify a Case ID, then the processing will take longer.

You can test several Classification or several Regression Models at the same time. The models to be tested can be in different nodes. The models to be tested must satisfy these conditions:

  • The nodes that contain the models must have the same function type. That is, they must be all Classification Build nodes or all Regression Build nodes.

    Classification Models must also have the same list of target attribute values.

  • The models must have the same target attribute with the same data type.

  • The Data Source node for Test must contain the target of the models.

  • The test data must be compatible with the models. That is, it should have been transformed in the same way as the data used to build the model.

Automatic Settings

By default, the option Automatic Settings is selected for a Test node.

Automatic Settings result in the following behavior:

  • When a model input node is connected, all models are added to the specification.

  • When a model input node is disconnected, all models are removed from the specification. The test node may become invalid.

  • When a model input node is edited in the following ways, the resultant behavior is as follows:

    • If models are added, then model specifications are automatically added to the Test node.

    • If models are removed, then the specifications are removed from the Test node.

    • If models are changed, then the following is done:

      • The Test node is updated to ensure the algorithm is consistent.

      • If the target changes and there is only one node as input to the Test node, then the node is updated to reflect the new target and keep all the models. Also, the test input data is validated to ensure that it still has the new column target.

      • If there are multiple Model nodes as input to the Test nodes, then the models with the changed target are automatically removed.

If Automatic Settings is deselected, then you must edit the node to reflect all changes to the input. Models are validated if they are added.

Creating a Test Node

You create a test node to test Classification and Regression models.

Before you create a Test node, you must connect a Data Source node to a Model node or Build node to the Test node.
To create a Test node:
  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
  2. Either identify or create a Data Source node containing the test data. Ensure that the test data is prepared in the same way as the build data.
  3. Select at least one Model node, Classification node, or Regression node. A model must be successfully built before it can be tested.

    Note:

    You can test either Classification or Regression models but not both kinds of models in one Test node.

  4. In the Workflow Editor, expand Evaluate and Apply, and click Test.
  5. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  6. Link the Data node, the Model node or Build node to the Test Node.
  7. Characteristics of the Test node are set by default. You can also edit the node.

Edit Test Node

In the Edit Test Node dialog box, you can specify or change the characteristics of the models to build.

To edit the Test Node, right-click the node and select Edit or double-click the node. The Edit Test Node dialog box opens.

The Edit Test Node dialog box displays the following:

  • Function (CLASSIFICATION or REGRESSION)

  • Target and the Data Type (data type of the target)

  • The Case ID (if there is one)

    It is recommended that you specify a case ID. If you do not specify a case ID, then processing will be slower. The case ID that you specify for the Text node should be the same as the case ID specified for the Build node.

  • Automatic Settings: By default, Automatic Settings is selected.

You can perform the following tasks:

  • Compare test results and view individual models even when Automatic Settings is selected. The models tested are listed in the Selected Models grid.

  • Make change to the list of the models. Deselect Automatic Settings and make changes in the Selected Models grid.

Select Model

The Select Model dialog box lists the models that are available for testing. To select models:

  1. Move the Models from Available Models to Selected Models.
  2. Click OK.

Compare Test Results Viewer

The Compare Test Result viewer displays test results for one or more models in the same node.

The following test results are displayed:

Test Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.

The Test node Properties pane has these sections:

Models

The Models tab lists the models to test in the Selected Models grid.

Selected Models

The Selected Models dialog box displays the details of the selected model. You can also add and delete models here.

For each model, the grid lists the following:

  • Model Name: Lists the model names. The partitioned models have the partitioned models icon next to it to indicate that they are partitioned.

    Note:

    If the Partitioned columns are incompatible among the selected models, then only global test results are generated.

  • Partition Columns: Lists the partition columns for each partitioned model.

  • Node: Lists the node containing the model.

  • Test: Indicates the test status of the model.

  • Algorithm: Lists the algorithm that is used to build the model.

You can perform the following tasks:

  • View Partition Column: Click partitioned model to view the details of the partitioned columns in the selected model. The name, data type and source of the partitioned columns are displayed in the Partition Columns Definition dialog box.

  • Add Model: Click add. You can only add models that have the same function. Before adding a model, deselect Automatic Settings.

  • Delete Model: Select it and click delete. Before deleting a model, deselect Automatic Settings.

Test

The Test section describes how testing is performed.

Test contains these information:

  • Function: CLASSIFICATION or REGRESSION.

  • Target: The name of the target.

  • Data Type: The data type of the target

  • For CLASSIFICATION, these test results are calculated by default:

    • Performance Metrics

    • ROC Curve (Binary Target Only)

    • Lift and Profit

    You can deselect Metrics.

    By default, the top 100 target values by frequency is specified. To change this value, click Edit. Edit the value in the Target Values Selection dialog box.

  • For REGRESSION, Accuracy Matrix and Residuals are selected. You can deselect Metrics.

    • The Performance Metrics are the metrics displayed on the Performance tab of the Test Viewer.

    • Residuals are displayed on the Residual tab of the Test Viewer.

Details

The Details section displays the node name and comments about the node.

You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.

Test Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view the Test node context menu, right-click the node. The following options are available in the context menu: