8 Model Nodes

Model nodes specify the models to build and the models to add to the workflow.

The Models section in the Components pane contains the Models nodes. The models in the Components pane are:

8.1 Types of Models

Lists the types of Model nodes supported by Oracle Data Miner.

The types of models available are:

  • Anomaly Detection Node: Builds Anomaly Detection models using a one-class Support Vector Machine (SVM).

  • Association Node: Builds models for market basket analysis.

  • Classification Node: Builds and tests classification models with the same target, case ID, cost, and split settings, where relevant. The models use the classification algorithms: Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT), and Generalized Linear Model (GLM).

  • Clustering Node: Builds clustering models using the clustering algorithms: k-Means, O-Cluster, and Expectation Maximization (EM). EM requires Oracle Database 12c Release 1 (12.1) or later.

  • Explicit Feature Extraction Node: Builds feature extraction models using the Explicit Semantic Analysis algorithm.

  • Feature Extraction Node: Builds feature extraction models using the feature extraction algorithms: nonnegative matrix factorization, principal components analysis (PCA), and singular value decomposition (SVD). PCA and SVD require Oracle Database 12c Release 1 (12.1) or later.

  • Model Node: Adds models to a workflow that were not built in the current workflow. This node has no input data.

  • Model Details Node: Extracts model details from a model build node, a Model node, or any node that produces a model.

  • Regression Node: Builds and tests a collection of Regression models with the same target, case ID, cost, and split settings, where relevant. The models use the regression algorithms: SVM and GLM.

8.2 Automatic Data Preparation (ADP)

Automatic Data Preparation (ADP) transforms the build data according to the requirements of the algorithm, embeds the transformation instructions in the model, and uses the instructions to transform the test or scoring data when the model is applied.

Data used for building a model must be properly prepared. Different algorithms have different input requirements. For example, Naive Bayes requires binned data.

If you are connected to Oracle Database 12c or later, then ADP prepares text data.

8.2.1 Numerical Data Preparation

Automatic Data Preparation prepares numerical data for different algorithms in different ways.

Here are some examples of how ADP prepares numerical data:

  • For algorithms that require binned data (such as Naive Bayes), ADP performs supervised binning. Supervised binning is a special binning approach that takes into account the target to find good cut-points in the predictor.

  • For algorithms that require normalized data (such as Support Vector Machines), the numerical data is normalized.

  • For algorithms that can handle untransformed data (such as Decision Tree), you can use the numerical data to find splitters in the tree with an approach similar to supervised binning.

8.2.2 Manual Data Preparation

For manual data preparation, you must understand the requirements of each algorithm and carry out the transformations in order to prepare the test data or scoring data.

You must perform manual binning for data which has business meaning, such as recoding a numeric column of ages to desired ranges like YOUTH, ADULT and so on. Otherwise, automatic data preparation is recommended.

8.3 Data Used for Model Building

Oracle Data Miner does not necessarily use all the columns in a Data Source when it builds models.

Model nodes use a set of heuristics to determine whether to exclude columns from the model building process or change the mining type from numerical to categorical only.

  • There are several reasons for not using a particular column for model building. If a column does not contain useful information, then it is usually not used.

    The exact list of attributes used as input to build the model depends on the algorithm used to build the model. If an algorithm does not support a data type, then Oracle Data Miner does not use attributes with that data type as input.

    For models that have targets, such as Classification models, the target cannot be text.

  • The same mining types are used for all models.

If you are connected to Oracle Database 12c Release 1 (12.1) and later, then specify the characteristics of Text attributes when you edit the Build node.

8.3.1 Viewing and Changing Data Usage

You can view and change data usage in the Input tab of the Build Editor and in the Advanced Settings dialog box.

8.3.1.1 Input Tab of Build Editor

In the Input tab, the setting Determine inputs automatically (using heuristics) controls the automatic selection of attributes to be used as inputs, and the automatic selection of mining types.

To edit a Build node:

  1. Double-click the node or right-click the node and select Edit.
  2. Click the Input tab. In the Input tab, the field Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner determines which attributes to use for input and characteristics of the attributes. Oracle Data Miner also determines the mining type, and specifies that auto data preparation is performed for all attributes. After the model is run, Oracle Data Miner generates rules describing the changes that it made, such as excluding an attribute or changing the mining type. To see detailed information about the heuristics, click Show.

Note:

You cannot view and edit data usage for an Association model using these steps.

8.3.1.1.1 Automatic Input

When Automatic Input is selected, Oracle Data Miner does not use attributes that do not provide useful information. For example, attributes that are almost constant may not be suitable for input.

After the node runs, rules describe the heuristics used. Click Show to see detailed informations.

8.3.1.1.2 Manual Input

To specify inputs manually, deselect Determine Inputs Automatically (using heuristics).

You can make the following changes by using the Manual Input option:

  • To ignore an attribute: If you do not want to use an attribute as input, then go to the Input column and click the output icon use for model build. Select the ignore icon do not use for model build; ignore and click OK. The attribute will not be used. It will be ignored. Similarly, to use an attribute that you have ignored, click do not use for model build; ignore in the Input column and select use for model build. The attribute is used in model build.

  • To change mining type of an attribute: Go to the Mining Type column and select an option from the drop-down list:

    • Numerical

    • Categorical

    Text mining types are Text and Text Custom. Select Text Custom to create a column- level text specification.

  • To manually prepare data: By default, Automatic Data Preparation (ADP) is performed on all attributes. If you do not want Automatic Data Preparation performed for an attribute, then deselect the corresponding check box for that attribute in the Auto Prep column. If you turn off Auto Prep, then you are responsible for data preparation for that attribute.

    Note:

    If the mining type of an attribute is Text or Text Custom, then you cannot deselect Automatic Data Preparation.

8.3.1.2 Advanced Settings

In the Advanced Settings dialog box, you can edit settings related to model settings, data usage, performance settings, and algorithm settings.

To view which columns are selected by Oracle Data Miner and what mining type is assigned to each selected column, follow these steps:

Note:

You cannot view and edit data usage for an Association Model using these steps.

  1. Connect the Data Source node to the Model node.
  2. Right-click the Model node and select Run.
  3. Open the Advanced Settings dialog box in one of these ways:
    • After the model build completes, right-click the Model node and select Edit.

      The Edit dialog box opens. Click Advanced.

    • After the model build completes, right-click the Model node and select Advanced Settings.

  4. The Advanced Settings has two grids:
    • The Model Settings grid: The grid at the top lists the models built by the node.

    • The lower part of the dialog box is a tabbed display of the following:

      • Data Usage: The Data Usage tab displays information about which columns are selected for Model build, the mining type used for model building for each column, Data Type, Input, Auto Data Prep, and Rules. To view the details about the rules (Heuristics), click Show.

      • Algorithm Settings

      • Performance Settings

  5. To view which columns are used as input for the Model build, select the model. In the Data Usage tab, the attributes used in model build rules indicate the heuristics applied to the attribute. For example, the mining type may be changed. For details, click Show.
  6. You can change data usage information on a per-model basis, or you can change the data usage for several models at the same time.

    To change data usage for several models, select the models by pressing the Ctrl key and clicking it simultaneously. Make the changes and click OK. The changes are made to the data usage for all selected models.

Note:

You can also turn Auto Data Prep off. This is not recommended. If you turn Auto Data Prep OFF, then you must ensure that the input is properly prepared for each algorithm.

8.3.2 Text

Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR, or NVARCHAR2.

If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.

If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.

Note:

If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.

To examine or specify text characteristics for machine learning, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.

The Text tab enables you to modify the following:

  • Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is 200.

  • Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:

    • Token (Default): For Token as the transform type, the Default Settings are:

      • Languages: Specifies the languages used in the documents. The default is English. To change this value, select an option from the drop-down list. You can select more than one language.

      • Bigram: Select this option to mix the NORMAL token type with their bigram. For example, New York. The token type is BIGRAM.

      • Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.

        Note:

        If both Bigram and Stemming are selected, then the token type is STEM_BIGRAM. If neither Bigram nor Stemming is selected, then token type is NORMAL.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.
        • Click edit to view the Stoplist Details. This opens the Stoplist Details dialog box.

        • Click add to add a new stoplist. This opens the New Stoplist Wizard.

      • Tokens: Specify the following:
        • Max number of tokens across all rows (document). The default is 3000.

        • Min number of rows (document) required for a token

    • Theme: If Theme is selected, then the Default Settings are as follows:

      • Language: Specifies the languages used in the documents. The default is English. To change this value, select one from the drop-down list. You can select more than one language.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added.

      • Themes: Specifies the maximum number of themes across all documents. The default is 3000.

    • Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.

  • Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.

    You can use the same stoplist for all text columns.

8.4 Model Nodes Properties

The Properties of the Model node allows you to examine and change characteristics of the node.

You can view the properties of a Model Build node in any one of the following ways:

  • Select the node and go to View and click Properties. Click the Properties tab if necessary.

  • Right-click the node and select Go to Properties from the context menu.

In earlier releases, Properties was called Property Inspector. Properties of Model nodes have the following sections:

8.4.1 Models

The Models section displays a list of the models defined in the node. By default, one model is built for each algorithm supported by the node.

For each model, the name of the model, build information, the algorithm, and comments are listed in a grid. The Build column shows the time and date of the last successful build or if the model is not built or did not build successfully.

You can add, delete, or view models in the list. You can also indicate in which models are passed to subsequent nodes or not.

  • To delete a model from the list, select it and click delete.

  • To add a model, click add. The Add Model dialog box opens.

  • To view a model that was built successfully, select the model and click view.

You can tune classification models from Properties pane.

8.4.1.1 Output Column

The Output Column in the Model Settings grid controls passing of models to subsequent nodes.

The default setting is to pass all models to subsequent nodes.

  • To ignore a model, that is, to not pass it to subsequent nodes, click ignore. The Output icon is replaced with the Ignore icon ignore.

  • To cancel the ignore, click the Ignore icon again. It changes to the output icon.

8.4.1.2 Add Model

In the Add Model dialog box, you can add a model to a node.

To add a model to a node:

  1. In the Algorithm field, select an algorithm from the drop-down list. For example, if you add a model to a clustering node, then the available algorithm are k-Means and O-Cluster. A default model name is displayed. You can change the default model.
  2. In the Comments field, add your comments, if any. This is an optional field.
  3. Click OK.

8.4.2 Build

The Build section displays information related to the model build. For models that have a target, such as Classification and Regression, the targets are listed. All models in a node have the same target.

The Build section displays the following:

  • Target: Displays the target. To change the target, select a new target from the drop-down list.

  • Case ID: Displays the case ID of the model defined in this node. All the models in the node have the same case IDs. To edit the case IDs, select a different case ID from the drop-down list.

  • Transaction ID: Displayed for Association models only. To change the transaction ID, click Edit.

  • Item ID: Displayed for Association models only. To change the value, select an option from the drop-down list.

  • Item Value: Displayed for Association models only. To change the value, select an option from the drop-down list.

8.4.3 Test

The Test section is displayed for Classification and Regression models. They are the only models that can be tested.

The Test section defines how tests are done. By default, all models are tested. All models in the node are tested in the same way.

8.4.4 Details

The Details section displays the node name and comments about the node.

You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.

8.5 Anomaly Detection Node

Anomaly detection is an important tool for detecting, for example, fraud, network intrusions, enterprise computing service interruptions, sensor time series prognostics, and other rare events that can have great significance, but are hard to find.

The goal of anomaly detection is to identify items, events, or observations that are unusual within data that is seemingly ’normal’. This data may consist of traditional enterprise data or IoT sensor data. Anomaly detection can be used to solve problems like the following:
  • A law enforcement agency compiles data about illegal activities, but nothing about legitimate activities. How can a suspicious activity be flagged? The law enforcement data is all of one class. There are no counterexamples.
  • An insurance agency processes millions of insurance claims, knowing that a very small number are fraudulent. How can the fraudulent claims be identified? The claims data contains very few counterexamples. They are outliers.
  • An IT department encounters compute resource performance anomalies. How can such anomalies be detected along with their source causes, such as resource-contention issues and complex memory leaks? The data contains sensor output from thousands of sensors.
  • An oil and gas enterprise or utility company requires proactive maintenance of business-critical assets, such as oil rigs or smart meters, to reduce operations and maintenance costs, improve up-time of revenue-generating assets, and improve safety margins for life-critical systems.

By default, an Anomaly Detection node builds one model using the one-class SVM algorithm. All models in the node have the same case ID. There are two ways to detect anomalies:

  • Build and apply an Anomaly Detection model.

  • Use an Anomaly Detection Query, one of the Predictive Query nodes.

An Anomaly Detection build can run in parallel. The following topics describe Anomaly Detection Nodes:

Related Topics

8.5.1 Create Anomaly Detection Node

An Anomaly Detection node builds one or more models that detect rare occurrences, such as fraud, and other anomalies using the One-Class SVM algorithm.

The input for a Model node is any node that generates data as an output, including Transform nodes and Data nodes.

Note:

If the data includes text columns, then prepare the text columns using a Build Text node. If you are connected to Oracle Database 12c or later, then use Automatic Data Preparation.

To create an Anomaly Detection node:

First create a workflow and then identify or create a Data Source node.
  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternatively, press Ctrl+Shift+P to dock the Components pane.
  2. In the Workflow Editor, expand Models and click Anomaly Detection.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. Move to the node that provides data for the build. Right-click and click Connect. Drag the line to the Anomaly Detection node and click again.
  5. You can also specify a case ID, edit the data usage, and change the algorithm settings. To perform any of these tasks, right-click the node and select Edit.
  6. The node is now ready to build. Right-click the node and click Run.

8.5.2 Edit Anomaly Detection Node

In the Edit Anomaly Detection Node dialog box, you can specify or change the characteristics of the models to build.

To open the Edit Anomaly Detection Node dialog box, either double-click an Anomaly Detection node, or right-click an Anomaly Detection node and click Edit.

The Edit Anomaly Detection Node dialog box has the following tabs:

8.5.2.1 Build (AD)

The Build tab for Anomaly Detection lists the models to be built and the Case ID.

Specify the following:

  1. Select Case ID. Select an attribute from the Case ID list. This attribute must uniquely identify a case.

    Note:

    A case ID is not required. However, a case ID helps ensure build and test repeatability.

    A case ID is required to generate GLM diagnostics.

    If you specify a case ID, then all models in the node have the same case ID.

  2. In the Models Settings list, specify the models you want to build. You can also perform the following tasks:
    • To add a model, click add. The Add Model dialog box opens.

    • To edit a model, select the model and click edit. The Advanced Model Settings dialog box opens.

    • To delete a model, select the model and click delete.

    • To copy an existing model, select the model and click copy.

  3. To complete the node definition, OK.
8.5.2.2 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.5.2.3 Input

The Input tab specifies the input for model build.

Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.

Note:

For R Build nodes, Auto Data Preparation is not performed.

After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.

You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).

8.5.2.4 Sampling

The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.

By default, Sampling is set to OFF. To set it to ON:

  1. Click ON, and then select:
    • System Determined

    • User Specified and specify the row size

  2. Click OK.
8.5.2.5 Text

Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR, or NVARCHAR2.

If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.

If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.

Note:

If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.

To examine or specify text characteristics for machine learning, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.

The Text tab enables you to modify the following:

  • Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is 200.

  • Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:

    • Token (Default): For Token as the transform type, the Default Settings are:

      • Languages: Specifies the languages used in the documents. The default is English. To change this value, select an option from the drop-down list. You can select more than one language.

      • Bigram: Select this option to mix the NORMAL token type with their bigram. For example, New York. The token type is BIGRAM.

      • Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.

        Note:

        If both Bigram and Stemming are selected, then the token type is STEM_BIGRAM. If neither Bigram nor Stemming is selected, then token type is NORMAL.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.
        • Click edit to view the Stoplist Details. This opens the Stoplist Details dialog box.

        • Click add to add a new stoplist. This opens the New Stoplist Wizard.

      • Tokens: Specify the following:
        • Max number of tokens across all rows (document). The default is 3000.

        • Min number of rows (document) required for a token

    • Theme: If Theme is selected, then the Default Settings are as follows:

      • Language: Specifies the languages used in the documents. The default is English. To change this value, select one from the drop-down list. You can select more than one language.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added.

      • Themes: Specifies the maximum number of themes across all documents. The default is 3000.

    • Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.

  • Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.

    You can use the same stoplist for all text columns.

8.5.3 Data for Model Build

Oracle Data Miner uses heuristic techniques on data for model build.

Oracle Data Miner uses heuristics to:

  • Determine the attributes of the input data used for model build.

  • Determine the mining type of each attribute.

8.5.4 Advanced Model Settings

The Advanced Settings dialog box lists all the models in the Model Settings section in the upper pane. You can add and delete models from the node.

To change or view advanced settings, right-click the node and select Advanced Settings.

  • To delete a model, select it and click delete.

  • To add a model, click add. The Add Model dialog box opens.

  • To modify data usage of a model, select the model in the upper pane. Make the necessary modifications in the Data Usage tab.

  • To modify the default algorithm, select the model in the upper pane. Make the necessary changes in the Algorithm Settings tab.

8.5.4.1 Add Model (AD)

In the Add Model dialog box, you can add or change a model for the node.

The algorithm is already selected for you. To add a model:

  1. In the Algorithm field, the selected algorithm is displayed. You can change this and select a different algorithm from the drop-down list.
  2. In the Name field, enter a name for the model.
  3. In the Comments field, add your comments, if any. This is an optional field.
  4. Click OK.
8.5.4.2 Data Usage

The Data Usage tab contains the data grid that lists all attributes in the data source.

The Data Usage tab is not supported for the Association node. To modify any values, to see which attributes are not used as input, or to see mining types, select View in the lower pane.

You can change data usage information for several models at the same time. For each attribute, the grid lists displays the following:

  • Attributes: This is the name of the attribute.

  • Data Type: This is the Oracle Database data type of the attribute.

  • Input: Indicates if the attribute is used to build the model. To change the input type, click Automatic. Then click the icon and select the new icon. For models that have a target, such as Classification and Regression models, the target is marked with a red target icon.

    • The pass icon indicates that the attribute is used to build the model.

    • The ignore icon indicates that the attribute is ignored, that is, it is not used to build the model.

  • Mining Type: This is the logical type of the attribute, either Numerical (numeric data), Categorical (character data), nested numerical, or nested categorical, text or custom text. If the attribute has a type that is not supported for mining, then the column is blank. Mining type is indicated by an icon. Move the cursor over the icon to see what the icon represents. To change the mining type, click Automatic and then click the type for the attribute. Select a new type from the list. You can change mining types as follows:

    • Numerical can be changed to Categorical. Changing to Categorical casts the numerical value to string.

    • Categorical.

    • Nested Categorical and Nested Numerical cannot be changed.

  • Auto Prep: If Auto Prep is selected, then automatic data preparation is performed on the attribute. If Auto Prep is not selected, then no automatic data preparation is performed for the attribute. In this case, you are required to perform any data preparation, such as normalization, that may be required by the algorithm used to build the model. No data preparation is done (or required) for target attributes. The default is to perform automatic data preparation.

  • Rules: After a model runs, Rules describe the heuristics used. For details, click Show.

There are two types of reasons for not selecting an attribute as input:

  • The attribute has a data type that is not supported by the algorithm used for model build.

    For example, O-Cluster does not support nested data types such as DM_NESTED_NUMERICALS. If you use an attribute with type DM_NESTED_NUMERICALS to build a O-Cluster model, then the build fails.

  • The attribute does not provide data useful for mining. For example, an attribute that has constant or nearly constant values.

    If you include attributes of this kind, then the model has lower quality than if you exclude them.

8.5.5 Anomaly Detection Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternatively, right-click the node and click Go to Properties.To view the properties of an Anomaly Detection node:

  • Right-click the node and select Go to Properties from the context menu.

  • If the Properties pane is closed, then go to View and click Properties.

Anomaly Detection Properties pane has the following sections:

8.5.5.1 Models (AD)

The Models section displays a list of the models defined in the node. The default is to build one model.

For each model, the name of the model, the build information, the algorithm, and comments are listed in a grid. The Build column shows the time and date of the last successful build or if the model is not built or did not build successfully.

You can add, delete, or view models in the list. You can also indicate in the which models are passed to subsequent nodes or not.

  • To delete a model, select it and click delete.

  • To add a model, click add. The Add Model model dialog box opens.

  • To view a model, click view. The appropriate model viewer opens.

  • To duplicate a model, select a model to duplicate and click duplicate.

8.5.5.1.1 Output Column (AD)

The Output Column in the Model Settings grid controls passing of the models to subsequent nodes.

The default is to pass all models to subsequent nodes.

  • To ignore a model, click pass. The Output icon is replaced with the Ignore ignore icon.

  • To cancel the ignore, click the Ignore icon again. The icon changes to the Output icon.

8.5.5.1.2 Add Model (AD)

In the Add Model dialog box, you can add or change a model for the node.

The algorithm is already selected for you. To add a model:

  1. In the Algorithm field, the selected algorithm is displayed. You can change this and select a different algorithm from the drop-down list.
  2. In the Name field, enter a name for the model.
  3. In the Comments field, add your comments, if any. This is an optional field.
  4. Click OK.
8.5.5.2 Build (AD)

The Build section displays the case ID for the models defined in this node.

All the models in the node have the same case ID. To change the case ID, select a different attribute from the list.

8.5.5.3 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.5.5.4 Details

The Details section displays the node name and comments about the node.

You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.

8.5.6 Anomaly Detection Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view the context menu options, right click the Anomaly Detection node. The following options are available in the context menu:

8.6 Association Node

The Association node defines one or more Association models. To specify data for the build, connect a Data Source node to the Association node.

All models in an Association node have the same input data.

Note:

The data for an Association model must be in transactional format.

Association models could generate a very large number of rules with low confidence and support, or they could generate no rules at all.

An Association build can run in parallel.

To learn about Association use cases, model building and algorithm selection, see Association Algorithms

This section contains the following topics:

8.6.1 Behavior of the Association Node

By default, an Association node builds one model using the Apriori algorithm.

The Apriori algorithm assumes the following:

  • The data is transactional.

  • The data has many missing values. The apriori algorithm interprets all missing values as sparse data, and it has its own mechanisms for handling sparse data.

All models in the node have the same case ID, item ID, and item value. The case ID can be two columns. For example, the data sources SH.SALES, CUST_ID and TIME_ID combined can be the case ID.

No automatic data preparation is done for an Association node. If you select a value for Item Value that is different from the default Existence, then you might have to prepare the data.

Related Topics

8.6.2 Create Association Node

The data used to build an Association model must be in transactional format.

To create an Association node:

First, create a workflow and then identify or create a data source.
  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternatively, press Ctrl+Shift+P to dock the Components pane.
  2. In the Workflow Editor, expand Models, and click Association.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. Move to the node that provides data for the build. Right-click the node and click Connect. Drag the line to the Association node and click again.
  5. The Edit Association Build Node window opens.
  6. For an Association node, specify the following:
    • Transaction ID: Click ID to insert one or more Transaction IDs.

    • Item ID: Select an option from the drop-down list.

    • Value: Existence (default)

  7. Click OK.
  8. After you finish the node definition, the node is ready for build. Right-click the node and click Run.

8.6.3 Edit Association Build Node

The Association Build Node editor enables you to specify or change the characteristics of the models to build.

To open the Edit Association Build Node dialog box, either double-click an Association node, or right-click an Association node and select Edit. The Edit Association Build Node dialog box comprises the following:

8.6.3.1 Build

In the Build tab, you can provide the details required for a model build.

Specify these settings in the Build tab:

  • Transaction IDs: These are a combination of attributes that uniquely identifies a transaction. To specify a transaction ID, click Edit selection. The Select Columns dialog box opens. Move one or more attributes from the Available Attributes list to the Selected Attributes list. Click OK.

  • Item ID: Identifies an item. Select an attribute from the list.

  • Item Value: Existence (default). You can select an attribute from the drop-down list. This is an optional field.

    The item value column may specify information such as the number of items (for example, three apples) or the type of the item (for example, Macintosh Apples).

    If you select an attribute from the list, then the attribute must have less than 10 distinct values. The default value for the maximum distinct count is 10. You can change the value in Model Build Preferences for Association.

    Note:

    If you specify an attribute for Item Value, then you might have to prepare the data.

You can perform the following tasks:

  • Add a model: Click add. The Add Model dialog box opens.

  • Delete a model: Select the model and click delete.

  • Edit a model: Select the model and click edit. The Advanced Settings for Association Node dialog box opens. Here, you can specify Model settings or Algorithm settings.

  • Copy an existing model: Select the model and click copy selected model.

At this point, you can click OK to finish the model definition.

8.6.3.1.1 Select Columns (AR)

In the Select Column dialog box, you can add or remove attributes to be included in or excluded from model build.

To select attributes:

  1. Select one or more attributes in the Available Attributes list.
  2. Use the arrows between the lists to move the selections to the Selected Attributes list.
  3. Click OK.
8.6.3.2 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.6.3.2.1 Advanced Settings

In the Advanced Settings dialog box, you can select and set the type of partition build.

To set the type of partition build:
  1. In the Partition Build Type field, select any one of the following options:
    • Combination of the two

    • Partition is processed at a single slave

    • Partition is processes across slaves

  2. Click OK.
8.6.3.3 Filter

In the Filter tab, you can add items to filter. The items are sourced from the Data Source node, and not from the model.

  1. Click Enable Filters.
    In the Include section, you can include items. In the Exclude section you can exclude items from the filter.
  2. Expand Include to filter and include items in the filter rule. You can add and delete items, both in the Antecedent and Consequent section of the rule.
    • Click The green color plus icon to indicate adding of items to the inclusion rule. to add items to the inclusion rule. The Find Items dialog box opens.

    • Click The red color cross icon to indicate removal of items from the rule. to remove items from the rule.

  3. Expand Exclude to filter and exclude items from the filter rule.
    • Click The green color plus icon to indicate adding of items to the exclusion rule. to add items to the exclusion rule. The Find Items dialog box opens.

    • Click The red color cross icon to indicate removal of items from the rule. to remove items from the rule.

  4. Click Advanced Settings. In the Preprocess Input Data dialog box, you may select the option Preprocess Input Data to Extract Items. If you select this option and run the node, then an internal table is generated that contains all distinct item values along with their respective total count and support. This table is used in place of querying the underlying data, thereby significantly improving the UI interaction.
  5. Click OK.
8.6.3.3.1 Find Items

In the Find Items dialog box, you can search and add items to be included in the filter rule or excluded from the filter rule.

  1. In the Search For field, enter the name of the item to search.
  2. In the Settings section, provide additional information about the item in the following fields:
    • Sort By

    • Fetch Size

    • Sample Size

    • Use All Data

  3. Click Find.
  4. In the Items Found section, select the items that you want to add to the filter rules, and click Add. The items are now displayed in the Selected Items section.
  5. Click OK.
8.6.3.4 Aggregates

In the Aggregates dialog box, you can add items to be used for aggregation.

To include items for aggregation or exclude from aggregation:

  1. Select the items that you want to add in the Available section.
  2. Click the arrows as applicable to move the items to the Selected section.
  3. Click OK.
    This adds or removes the items to be used in the association rules.
8.6.3.5 Sampling

The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.

By default, Sampling is set to OFF. To set it to ON:

  1. Click ON, and then select:
    • System Determined

    • User Specified and specify the row size

  2. Click OK.

8.6.4 Advanced Settings for Association Node

The Advanced Settings dialog box enables you to add or delete models, and modify the default algorithm settings for each model.

The upper pane of the dialog box lists all the models in the node. You can add and delete models.
  1. To open the Advanced Settings dialog box:
    • Click The yellow color pencil icon to indicate edit options. in the Edit Association Build Node dialog box.

    • Right-click the node and click Advanced Settings.

    The Advanced Settings dialog box opens.
  2. You can perform the following tasks:
    1. Delete a model.
    2. Add a model.
    3. Change algorithm settings. To change algorithm settings, select a model in the upper pane. In the Algorithm Settings tab, you can change maximum rule length, minimum confidence, and minimum support.
  3. Click OK.

Note:

It is possible for an Association model to generate a very large number of rules or no rules at all.

8.6.5 Association Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view the context menu options, right click the node. The following options are available in the Association node context menu:

8.6.6 Association Build Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternatively, right-click the node and click Go to Properties.

The Association Build node Properties pane has the following sections:

8.6.6.1 Models (AR)

The Models section displays a list of the models defined in the node. The default is to build one model.

For each model, the name of the model, build information, the algorithm, and comments are listed in a grid. The Build column shows the time and date of the last successful build or if the model is not built or did not build successfully.

You can add, delete, or view models in the list. You can also indicate which models are passed to subsequent nodes or not.

  • To delete a model from the list, select it and click delete.

  • To add a model, click add. The Add Model dialog box opens.

  • To view a model that is built successfully, click view. The appropriate model view opens.

  • To make a copy of a model, select the model and click duplicate.

8.6.6.1.1 Add Model (AR)

The algorithm is already selected for you. To add a model to the list:

  1. Accept or change the model name.
  2. In the Comments field, add comments, if any. This is optional.
  3. Click OK. This adds the new model to the list. The new model has the same build characteristics as existing models. It also has the default values for advanced settings.
8.6.6.1.2 Output Column (AR)

The Output column in the Model Settings grid controls the passing of models to subsequent nodes. By default, all models are passed to subsequent nodes. You can perform the following tasks:

  • To ignore a model, click pass. The icon changes to ignore.

  • To cancel an ignored model, click the ignore icon ignore again. The icon changes to the Output icon.

8.6.6.2 Build (AR)

The Build section displays the transaction ID, item ID and item value of the models defined in the node.

All models in the node have the same transaction ID, item ID and item value. The information that is displayed are:

  • Transaction IDs: Click Edit to change the transaction ID.

  • Item ID: You can select a different item ID from the drop-down list.

  • Item Value: You can select a different item value from the drop-down list.

8.6.6.3 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.6.6.4 Filter

In the Filter tab, you can add items to filter. The items are sourced from the Data Source node, and not from the model.

  1. Click Enable Filters.
    In the Include section, you can include items. In the Exclude section you can exclude items from the filter.
  2. Expand Include to filter and include items in the filter rule. You can add and delete items, both in the Antecedent and Consequent section of the rule.
    • Click The green color plus icon to indicate adding of items to the inclusion rule. to add items to the inclusion rule. The Find Items dialog box opens.

    • Click The red color cross icon to indicate removal of items from the rule. to remove items from the rule.

  3. Expand Exclude to filter and exclude items from the filter rule.
    • Click The green color plus icon to indicate adding of items to the exclusion rule. to add items to the exclusion rule. The Find Items dialog box opens.

    • Click The red color cross icon to indicate removal of items from the rule. to remove items from the rule.

  4. Click Advanced Settings. In the Preprocess Input Data dialog box, you may select the option Preprocess Input Data to Extract Items. If you select this option and run the node, then an internal table is generated that contains all distinct item values along with their respective total count and support. This table is used in place of querying the underlying data, thereby significantly improving the UI interaction.
  5. Click OK.
8.6.6.5 Aggregates

In the Aggregates dialog box, you can add items to be used for aggregation.

To include items for aggregation or exclude from aggregation:

  1. Select the items that you want to add in the Available section.
  2. Click the arrows as applicable to move the items to the Selected section.
  3. Click OK.
    This adds or removes the items to be used in the association rules.
8.6.6.6 Sampling

The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.

By default, Sampling is set to OFF. To set it to ON:

  1. Click ON, and then select:
    • System Determined

    • User Specified and specify the row size

  2. Click OK.
8.6.6.7 Details

The Details section displays the node name and comments about the node.

You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.

8.7 Classification Node

The Classification node defines one or more classification models to build and to test.

To specify data for the build, connect a Data Source node to the Classification node. The models in a Classification node all have the same target and case ID. You can only specify one target. A Classification build can run in parallel.

There are two ways to make classification predictions:

  • By building and testing a classification model. This can be done by using a classification node, and then applying the model to the new data to make classifications.

  • By using a prediction query, which is one of the predictive queries.

To learn about classification algorithms and model building, see Classification Algorithms

The section contains the following topics:

8.7.1 Default Behavior for Classification Node

The default behavior of Classification node is based on certain algorithms, testing and tuning of models, Case ID and so on.

  • Algorithms used: For a binary target, the Classification node builds models using the following four algorithms:

    If the target is not binary, then GLM is not built by default. You can explicitly add a GLM model to the node. The models must have the same build data and same target.

    Note:

    If do not want to create a particular model, then delete the model from the list of models. The blue check mark to the left side of the model name selects models to be used in subsequent nodes. It does not select models to build.

  • Testing of models: By default, the models are all tested. The test data is created by randomly splitting the build data into a build data set and a test data set. The default ratio for the split is 60:40. That is, 60 percent build and 40 percent test. Oracle Data Miner uses compression when it creates the build and test tables when appropriate.

  • Connecting nodes: You can connect both a build Data Source node and a test Data Source node to the Build node.

  • Testing models: You can test Classification models using a Test node along with separate test data.

  • Interpreting test results

  • Tuning models: After testing a classification, you can tune each model.

  • Case ID: The case ID is optional. However, if you do not specify a case ID, then the processing will be slower.

8.7.2 Create a Classification Node

The Classification node defines one or more classification models to build and to test.

First, create a workflow. Then, identify or create a Data Source node for the Classification node.
To create a Classification node:
  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternatively, press Ctrl+Shift+P to dock the Components pane.
  2. In the Workflow Editor expand Models, and click Classification.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. Move to the node that provides data for the build. Right-click and click Connect. Drag the line to the Classification node and click again.
  5. The Edit Classification Build Node dialog box opens. You must specify a target. All models in the node have the same target. The target cannot be text.
  6. To specify a separate Data Source node for test, connect a second Data Source node to the build node. This is optional.
  7. After you finish the edit operation and connect the optional test data source, the node should be ready to build. Right-click the node and select Run from the menu.

    If you specified a test data source, when the node runs, then the connection from the build data source is labeled Build and the connection from the test data source is labeled Test.

8.7.3 Data for Model Build

Oracle Data Miner uses heuristic techniques on data for model build.

Oracle Data Miner uses heuristics to:

  • Determine the attributes of the input data used for model build.

  • Determine the mining type of each attribute.

8.7.4 Edit Classification Build Node

In the Edit Classification Build Node dialog box, you can specify or change the characteristics of the models to build.

To open the Edit Classification Build Node dialog box, either double-click a Classification Node, or right-click a Classification node and select Edit.

The Edit Classification Build Node dialog box has the following tabs:

8.7.4.1 Build (Classification)

The Build node enables you to specify or change the characteristics of the models to build.

To edit the characteristics of the models to build, follow these steps:

  1. In the Target field, select the target from the drop-down list. The list consist of attributes from the table or view specified in the Data Source node that is connected to the build node.

    You must specify a target. All models in the node have the same target.

  2. In the Case ID field, select one attribute from the drop-down list. This attribute must uniquely identify a case. If you specify a case ID, then all models in the node will have the same case ID.

    Note:

    If you do not specify a case ID, then the processing will be slower because a table must be generated.

    The case ID is required to generate GLM diagnostics.

    A case ID is required if a column in the input data is a nested column. That is, very dense and deep (lots of name-value pairs). If there is no case ID, then the sorting operations may fail.

  3. In the Model Settings section, select which models you want to build. For a Classification node with a binary target, Naive Bayes (NB), Decision Tree (DT), Support Vector Machine (SVM), and Generalized Linear Models (GLM) models are specified by default.
    • To delete a model, select the model and click delete.

    • To edit a model, select the model and click edit.

    • To add models, click add.

    • To copy an existing model, select the model to be copied and click copy.

By default, the model is tested using a test data set created by splitting the build data set. If you do not want to test the model in this way, then go to the Classification Test Node section in Classification node Properties pane. You can instead use a Test Node, and a test data source node to test the model.

8.7.4.1.1 No Case ID

If a case ID is not supplied, then Oracle Data Miner creates a table for the all the input data that contains a generated case ID using the row number.

This table is used as the source to create the build and test random sample views. The generated case ID is constant for all queries. This ensures that consistent test results are generated.

8.7.4.2 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.7.4.2.1 Add Partition Column

In the Add Partition Column dialog box, you can add columns for partitioning. Partition columns are used to partition build models.

Select the columns that you want to partition in the Available Attributes list, and click the arrows to move them the Selected Attributes list. In the Available Attributes list, only the columns with the supported data types are displayed.

8.7.4.3 Sampling

The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.

By default, Sampling is set to OFF. To set it to ON:

  1. Click ON, and then select:
    • System Determined

    • User Specified and specify the row size

  2. Click OK.
8.7.4.4 Input

The Input tab specifies the input for model build.

Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.

Note:

For R Build nodes, Auto Data Preparation is not performed.

After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.

You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).

8.7.4.5 Text

Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR, or NVARCHAR2.

If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.

If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.

Note:

If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.

To examine or specify text characteristics for machine learning, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.

The Text tab enables you to modify the following:

  • Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is 200.

  • Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:

    • Token (Default): For Token as the transform type, the Default Settings are:

      • Languages: Specifies the languages used in the documents. The default is English. To change this value, select an option from the drop-down list. You can select more than one language.

      • Bigram: Select this option to mix the NORMAL token type with their bigram. For example, New York. The token type is BIGRAM.

      • Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.

        Note:

        If both Bigram and Stemming are selected, then the token type is STEM_BIGRAM. If neither Bigram nor Stemming is selected, then token type is NORMAL.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.
        • Click edit to view the Stoplist Details. This opens the Stoplist Details dialog box.

        • Click add to add a new stoplist. This opens the New Stoplist Wizard.

      • Tokens: Specify the following:
        • Max number of tokens across all rows (document). The default is 3000.

        • Min number of rows (document) required for a token

    • Theme: If Theme is selected, then the Default Settings are as follows:

      • Language: Specifies the languages used in the documents. The default is English. To change this value, select one from the drop-down list. You can select more than one language.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added.

      • Themes: Specifies the maximum number of themes across all documents. The default is 3000.

    • Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.

  • Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.

    You can use the same stoplist for all text columns.

8.7.5 Advanced Settings for Classification Models

The Advanced Settings dialog box enables you to edit data usage and other model specifications, add and remove models from the node.

The Advanced Settings dialog box comprises the following settings:

  • Data usage

  • Algorithm settings

  • Performance settings

To change or view advanced settings, click edit in the Edit Classification Build Node dialog box dialog box. Alternatively, right-click the Classification Build node and click Advanced Settings.

The Advanced Settings dialog box lists all of the models in the node in the upper pane. You can add models and delete models in the upper pane of the dialog box.

In the lower pane, you can view or edit the following for the model selected in the upper pane:

8.7.5.1 Add Models

To add a model to the list, click add. The Add Model dialog box opens.

8.7.5.1.1 Add Model (Classification)

In the Add Model dialog box, you can add additional models.

To add a model:
  1. In the Algorithm field, select an algorithm.
  2. In the Name field, a default name is displayed. You can use the default or rename the model.
  3. In the Comments field, you can enter comments, if any. This is an optional field.
  4. Click OK to add the model to the node.

8.7.6 Classification Node Properties

The Classification node properties enables you to view and change information about model build and test.

Specify a target before building Classification models. You can specify a case ID. If you do not specify a case ID, then the processing will be slower.

If you are unable to view Properties, then go to View and click Properties. Alternatively, right-click the node and click Go to Properties.

Classification node Properties pane has these sections:

8.7.6.1 Classification Node Models

The Classification node lists the models that are built when the node runs. By default, the Classification Build node creates three classification models

Each Classification Model uses a different classification algorithm:

  • Support Vector Machine (SVM)

  • Naive Bayes (NB)

  • Decision Tree (DT)

  • Generalized Linear Models (GLM). This algorithm is used as default, only if the target is binary. For multi-class targets, you can also specify the GLM algorithm if you add a model.

Model Setting lists the models that are built.

You can perform the following tasks:

  • Add: To add a model, click add. The Add Model dialog box opens.

  • Delete: To delete the models, select it and click delete.

  • Compare Test Results: If models were tested, then you can compare test results by selecting two or more models and clicking compare.

  • View: If a model built successfully, then you can view the model by selecting the model and clicking view. The Model viewer depends on the algorithm used to create the model.

  • Duplicate: To copy a model, select the model and click duplicate.

  • Tune Models: To tune models, select the model and click tune. This option is not available for partitioned models.

You can also indicate which models are passed to subsequent nodes or not.

8.7.6.1.1 Classification Node Output Column

The Output column in the Model Settings grid controls the passing of models to subsequent nodes. By default, all models are passed to subsequent nodes.

  • To ignore a model, that is, not to pass it to subsequent nodes, click output. The icon changes to the Ignore icon ignore

  • To cancel the ignore, click the Ignore icon again. It changes to the output icon.

8.7.6.2 Classification Node Build

The Build section displays the target and the case ID. The Build node must be connected to a Data Source node.

You can perform the following tasks:

  • Target: You can select a target from the Target drop-down list.

  • Case ID: To change or select a case ID, select one attribute from the case ID drop-down list. This attribute uniquely identifies a case. case ID is an optional field. If you do not select a case ID, then the processing will be slower.

8.7.6.3 Classification Node Test

The Test section specifies the data used for test and which tests to perform.

You can set the following settings:

  • Perform Test: Select this option to test the Classification Node. The default setting is to test all models built using the test data that is created by randomly splitting the build data into two subsets. By default, the following tests are performed:

    • Performance Metrics

    • Performance Matrix

    • ROC Curve (Binary Class only)

    • Lift and Profit: Lift and profit for the top 5 target classes by frequency. Click Edit. The Target Values Selection dialog box opens.

    • Generate Selected Test Results for Tuning: If you plan to tune the models, then you must test the models in the Build node, not in a Test node.

      Note:

      This option is not available for partitioned models.

  • Test Data: Select any one of the following options, by which Test Data is created:

    • Use all Mining Build Data for Testing

    • Use Split Build Data for Testing

      • Split for Test (%)

      • Create Split as: Table (default)

    • Use a Test Data Source for Testing: Select this option to connect the Test Data Source to the Build node, after you connect the Build data.

Note:

Another way to test a model is to use a Test node.

8.7.6.3.1 Target Values Selection

In the Target Values Selection dialog box, you can change the number of target values by changing the frequency count.

The Target Values Selection dialog box displays the number of target values selected. The default option Automatic is to use the top five target class values by frequency. You can change the number of target values by changing the frequency count. You can also select the option Use Lowest Occurring.

  • Automatic: By default, use the top five target class values by frequency.

    • Frequency Count: You can change the number of target values by changing the values in this value.

    • Use Lowest Occurring

    • Use Highest Occurring

  • Custom: Select this option to specify specific target values. Then, move the values from Available Values to Selected Values.

8.7.6.4 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.7.6.5 Details

The Details section displays the node name and comments about the node.

You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.

8.7.7 Classification Build Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view the context menu options, right click the node. The following options are available in the context menu:

8.7.7.1 View Test Results

Select a model and then view the test results for the model.

8.7.7.2 Compare Test Results

You can compare all successfully built models in the node by comparing the text results.

8.8 Clustering Node

A Clustering node builds clustering models using the k-Means, O-Cluster, and Expectation Maximization algorithms.

There are two ways to cluster data:

  • By building a Clustering model: Use a Classification node. Then apply the model to new data to create clusters.

  • By using a Clustering query, which is one of the predictive queries.

A Clustering build can run in parallel.

Note:

Expectation Maximization models require Oracle Database 12c Release 1 (12.1) or later.

To learn about clustering use cases, model building and algorithm selection, see Clustering Algorithms

This section contains the following topics:

8.8.1 Default Behavior for Clustering Node

A Clustering node builds three models using three different algorithms.

The algorithms used by the Clustering node are:

  • k-Means algorithm (KM)

  • Orthogonal Partitioning Clustering (OC)

  • Expectation Maximization (EM). For EM, Oracle Database 12c Release 1 (12.1) or later is required.

A case ID is optional.

The models all have the same build data.

Note:

If do not want to create a model, then delete the model from the list of models. The blue check mark to the left of the model name selects models to be used in subsequent nodes, such as Apply. It does not select models to build.

8.8.2 Create Clustering Build Node

You create a Clustering node to build clustering models using the k-Means, O-CLuster, and Expectation Maximization algorithms.

First create a workflow. Then, identify or create a Data Source node.
To create a Clustering node and attach data to it:
  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternatively, press Ctrl+Shift+P to dock the Components pane.
  2. In the Workflow Editor, expand Models and click Clustering.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. Move to the node that provides data for the build. Right-click the node and click Connect.Drag the line to the Classification node and click again.
  5. Right-click the Clustering node and click Run. The node runs and builds the models.

8.8.3 Data for Model Build

Oracle Data Miner uses heuristic techniques on data for model build.

Oracle Data Miner uses heuristics to:

  • Determine the attributes of the input data used for model build.

  • Determine the mining type of each attribute.

8.8.4 Edit Clustering Build Node

The Edit Clustering Build Node dialog box you can specify or change the characteristics of the models to build.

To open the Edit Clustering Build Node dialog box, double-click a Clustering node. Alternatively, you can right-click a Clustering node and select Edit.

The Edit Clustering Build Node dialog box has three tabs:

8.8.4.1 Build (Clustering)

The Build tab enables you to specify or change the characteristics of the models to build.

To edit the characteristics of the models to build:

  1. In the Case ID field, select an attribute from the drop-down list. This attribute must uniquely identify a case.

    Note:

    A case ID is not required. However, a case ID helps ensure build and test repeatability.

    If you specify a case ID, then all models in the node have the same case ID.

  2. In the Model Settings list, select the models you want to build. For a Clustering node, you can build models using the following algorithms:
    • k- Means (KM)

    • Orthogonal Partitioning Clustering (OC)

    • Expectation Maximization (EM). For this algorithm, Oracle Database 12c release 1 (12.1) or later is required.

    You can perform the following tasks:

    • Delete: To delete any models, select the models and click delete.

    • Add: To add a model, click add.

    • Copy: To copy a model, select the model and click copy.

  3. Click OK.
8.8.4.1.1 Add Model (Clustering)

In the Add Model dialog box, you can add models to the Clustering node.

In the Add Model dialog box:

  1. In the Algorithm field, select an algorithm, either KM, OC or EM.
    • k-Means

    • Orthogonal Partitioning Clustering

    • Expectation Maximization. For this option, Oracle Database 12c Release 12.1 or later is required.

  2. In the Name field, a default name is displayed. You can use the default name or rename the model.
  3. In the Comment field, enter comments, if any. This is an optional comment.
  4. Click OK.
This adds the model to the node.
8.8.4.2 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.8.4.3 Sampling

The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.

By default, Sampling is set to OFF. To set it to ON:

  1. Click ON, and then select:
    • System Determined

    • User Specified and specify the row size

  2. Click OK.
8.8.4.4 Input

The Input tab specifies the input for model build.

Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.

Note:

For R Build nodes, Auto Data Preparation is not performed.

After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.

You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).

8.8.4.5 Text

Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR, or NVARCHAR2.

If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.

If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.

Note:

If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.

To examine or specify text characteristics for machine learning, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.

The Text tab enables you to modify the following:

  • Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is 200.

  • Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:

    • Token (Default): For Token as the transform type, the Default Settings are:

      • Languages: Specifies the languages used in the documents. The default is English. To change this value, select an option from the drop-down list. You can select more than one language.

      • Bigram: Select this option to mix the NORMAL token type with their bigram. For example, New York. The token type is BIGRAM.

      • Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.

        Note:

        If both Bigram and Stemming are selected, then the token type is STEM_BIGRAM. If neither Bigram nor Stemming is selected, then token type is NORMAL.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.
        • Click edit to view the Stoplist Details. This opens the Stoplist Details dialog box.

        • Click add to add a new stoplist. This opens the New Stoplist Wizard.

      • Tokens: Specify the following:
        • Max number of tokens across all rows (document). The default is 3000.

        • Min number of rows (document) required for a token

    • Theme: If Theme is selected, then the Default Settings are as follows:

      • Language: Specifies the languages used in the documents. The default is English. To change this value, select one from the drop-down list. You can select more than one language.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added.

      • Themes: Specifies the maximum number of themes across all documents. The default is 3000.

    • Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.

  • Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.

    You can use the same stoplist for all text columns.

8.8.5 Advanced Settings for Clustering Models

In the Advanced Settings dialog box, you can review and change settings related to data usage and algorithms used in the model.

To access advanced settings, click edit in the Edit Clustering Build Node dialog box. Alternatively, right-click the node and select Advanced Settings. The Advanced Settings dialog box list all the models in the upper pane.

You can perform the following tasks:

  • Inspect and change the data usage and algorithm

  • Add models to the node

  • Delete models from the node

In the lower pane, you can view and modify data usage and algorithm settings for the model selected in the upper pane. You can edit the following:

The settings that can be changed depend on the algorithms.

8.8.6 Clustering Build Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternatively, right-click the node and click Go to Properties.

The Clustering Build node properties has these sections:

8.8.6.1 Models

The Models section in Properties lists the models that are built when the nodes are run.

The default is to build two Clustering models using the KM, OC, and EM algorithms.

The Model Settings grid lists the models in the node. You can perform the following tasks:

  • Search models

  • Add models

  • Delete models

  • Duplicate models

  • View models

  • Indicate which models are passed on to subsequent nodes.

8.8.6.1.1 Add Model (Clustering)

In the Add Model dialog box, you can add models to the Clustering node.

In the Add Model dialog box:

  1. In the Algorithm field, select an algorithm, either KM, OC or EM.
    • k-Means

    • Orthogonal Partitioning Clustering

    • Expectation Maximization. For this option, Oracle Database 12c Release 12.1 or later is required.

  2. In the Name field, a default name is displayed. You can use the default name or rename the model.
  3. In the Comment field, enter comments, if any. This is an optional comment.
  4. Click OK.
This adds the model to the node.
8.8.6.1.2 View Models

Use the View Models option to view the details of the models that are built after running the workflow.

To view models, you must select a model from the list to open the model viewer. A model must be built successfully before it can be viewed.

8.8.6.1.3 Clustering Node Output Column

The Output column in the Model Settings grid controls the passing of models to subsequent nodes.

By default, all models are passed to subsequent nodes.

  • To ignore a model, that is, to not pass it to subsequent nodes, click pass. The Output icon changes to ignore.

  • To cancel the ignore, click the Ignore icon again. The icon changes to the Output icon.

8.8.6.2 Build

The Build section in Properties displays the Case ID of the Clustering model.

To change the case ID, select an attribute from the Case ID drop-down list.

8.8.6.3 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.8.6.4 Sampling

The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.

By default, Sampling is set to OFF. To set it to ON:

  1. Click ON, and then select:
    • System Determined

    • User Specified and specify the row size

  2. Click OK.
8.8.6.5 Details

The Details section displays the name of the node and any comments about it.

You can change the name and comments in the fields here:

  • Node Name

  • Node Comments

8.8.7 Clustering Build Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view the context menu options, right click the node. The following options are available in the context menu:

8.9 Explicit Feature Extraction Node

The Explicit Feature Extraction node is built using the feature extraction algorithm called Explicit Semantic Analysis (ESA).

ESA is a vectorial representation of text, which can be individual words or entire documents. The algorithm uses a document corpus as the knowledge base. In ESA, a word is represented as a column vector in the tf–idf matrix of the text corpus and a document is represented as the centroid of the vectors representing its words. Oracle Machine Learning provides a prebuilt ESA model based on Wikipedia. You can import the model to Oracle Data Miner for mining purposes.

You can use the Explicit Feature Extraction node for the following purposes:

  • Document classification

  • Calculations related to semantics

  • Information retrieval

8.9.1 Create Explicit Feature Extraction Node

You create an Explicit Feature Extraction node for the purposes related to information retrieval, documentation classification, and for all other calculations related to semantics.

First create a workflow and then identify or create a Data Source node.
The input for an Explicit Feature Extraction node is any node . To create an Explicit Feature Extraction node:
  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternatively, press Ctrl+Shift+P to dock the Components pane.
  2. In the Workflow Editor, expand Models and click Explicit Feature Extraction.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. Move to the node that provides data for the build. Right-click and click Connect. Drag the line to the Explicit Feature Extraction. node and click again.
  5. You can also specify a case ID, edit the data usage, and change the algorithm settings. To perform any of these tasks, right-click the node and select Edit. The Edit Explicit Feature Extraction Node dialog box opens.
  6. The node is now ready to build. Right-click the node and click Run.

8.9.2 Edit Explicit Feature Extraction Node

When you create an Explicit Feature Extraction node, an ESA model with the default algorithm settings is added. You can add additional ESA models and edit them in the Edit Explicit Feature Extraction Node dialog box.

The Edit Explicit Feature Extraction Node dialog box comprises the following tabs:

Related Topics

8.9.2.1 Build

The Build tab enables you to specify or change the characteristics of the models to build.

To edit the characteristics of the model to build, follow these steps:

  1. In the Topic ID field, select an attribute for building the model.
  2. In the Model Settings list, select which models you want to build. You can build Support Vector Machine (SVM) and Generalized Linear Models (GLM). You can delete any of these models by selecting the model and clicking
    • To delete any model, select the model and click delete.

    • To add models, click add.

    • To edit a model, click edit.

    • To copy an existing model, select the model and click copy

  3. Click OK.
8.9.2.1.1 Add Model

The Add Model dialog box allows you to add additional ESA models to the Explicit Feature Extraction node.

To add a model:

  1. In the Algorithm field, the Explicit Semantic Algorithm is displayed.
  2. In the Name field, edit the name.
  3. In the Commentsfield enter comments, if any.
  4. Click OK.
8.9.2.2 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.9.2.2.1 Add Partitioning Columns

Partitioning columns result in building a virtual model for each unique partition. Because the virtual model uses data only from a specific partition, it can potentially predict cases more accurately than if you did not select a partition.

In addition to selecting attributes, you can specify partitioning expressions. Partitioning expressions are concatenated and the result expression is the same for all predictive functions.

  1. Select one or more attributes in the Available Attributes list to serve as partitions.
  2. Move the selected columns to the Selected Attributes list using the arrows.
  3. Click OK. The attributes are moved to the Partition list.

Optionally, you can add partitioning expressions.

8.9.2.3 Sampling

The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.

By default, Sampling is set to OFF. To set it to ON:

  1. Click ON, and then select:
    • System Determined

    • User Specified and specify the row size

  2. Click OK.
8.9.2.4 Input

The Input tab specifies the input for model build.

Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.

Note:

For R Build nodes, Auto Data Preparation is not performed.

After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.

You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).

8.9.2.5 Text

Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR, or NVARCHAR2.

If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.

If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.

Note:

If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.

To examine or specify text characteristics for machine learning, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.

The Text tab enables you to modify the following:

  • Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is 200.

  • Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:

    • Token (Default): For Token as the transform type, the Default Settings are:

      • Languages: Specifies the languages used in the documents. The default is English. To change this value, select an option from the drop-down list. You can select more than one language.

      • Bigram: Select this option to mix the NORMAL token type with their bigram. For example, New York. The token type is BIGRAM.

      • Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.

        Note:

        If both Bigram and Stemming are selected, then the token type is STEM_BIGRAM. If neither Bigram nor Stemming is selected, then token type is NORMAL.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.
        • Click edit to view the Stoplist Details. This opens the Stoplist Details dialog box.

        • Click add to add a new stoplist. This opens the New Stoplist Wizard.

      • Tokens: Specify the following:
        • Max number of tokens across all rows (document). The default is 3000.

        • Min number of rows (document) required for a token

    • Theme: If Theme is selected, then the Default Settings are as follows:

      • Language: Specifies the languages used in the documents. The default is English. To change this value, select one from the drop-down list. You can select more than one language.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added.

      • Themes: Specifies the maximum number of themes across all documents. The default is 3000.

    • Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.

  • Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.

    You can use the same stoplist for all text columns.

8.9.3 Advanced Model Settings

In the Advanced Model Settings dialog box, you can edit and set algorithm settings of the selected Explicit Semantic Analysis model.

The Explicit Semantic Algorithm (ESA) model has only three algorithm settings:

  • Data Usage: Displays the attribute name, data type, mining type and other details about the attributes in the selected model. You can customize your input source here.

  • Algorithm Settings: The following are the algorithm settings for an ESA model:

    • Top N Features: Controls the maximum number of features per attribute. It must be a positive integer. The default is 1000.

    • Minimum Items: Determines the minimum number of non-zero entries that need to be present in an input row.

    • Threshold Value: This setting thresholds very small values in the transformed build data. It must be a non-negative number. The default is 0.00000001.

8.9.4 Explicit Feature Extraction Build Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternatively, right-click the node and click Go to Properties.

The Explicit Feature Extraction Build node properties has these sections:

8.9.4.1 Models

The Models section displays a list of the models defined in the node. By default, one model is built for each algorithm supported by the node.

For each model, the name of the model, build information, the algorithm, and comments are listed in a grid. The Build column shows the time and date of the last successful build or if the model is not built or did not build successfully.

You can add, delete, or view models in the list. You can also indicate in which models are passed to subsequent nodes or not.

  • To delete a model from the list, select it and click delete.

  • To add a model, click add. The Add Model dialog box opens.

  • To view a model that was built successfully, select the model and click view.

You can tune classification models from Properties pane.

8.9.4.2 Build

The Build section displays information related to the model build. For models that have a target, such as Classification and Regression, the targets are listed. All models in a node have the same target.

The Build section displays the following:

  • Target: Displays the target. To change the target, select a new target from the drop-down list.

  • Case ID: Displays the case ID of the model defined in this node. All the models in the node have the same case IDs. To edit the case IDs, select a different case ID from the drop-down list.

  • Transaction ID: Displayed for Association models only. To change the transaction ID, click Edit.

  • Item ID: Displayed for Association models only. To change the value, select an option from the drop-down list.

  • Item Value: Displayed for Association models only. To change the value, select an option from the drop-down list.

8.9.4.3 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.9.4.4 Details

The Details section displays the node name and comments about the node.

You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.

8.9.5 Explicit Feature Extraction Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view the context menu options, right click the node. The following options are available in the context menu:

8.10 Feature Extraction Node

A Feature Extraction node uses the Nonnegative Matrix Factorization (NMF) algorithm, to build models.

There are two ways to extract features:

  • Build a feature extraction model, using a Feature Extraction node.

  • Use a Feature Extraction Query, which is one of the predictive queries.

If Oracle Data Miner is connected to Oracle Database 12c Release 1 (12.1) and later, then the Feature Extraction node uses PCA and SVD algorithms to build models.

Note:

Principal Components Analysis and Singular Value Decomposition models require Oracle Database 12c Release 1 (12.1) and later.

A Feature Extraction Build can run in parallel. To know more about Feature Extraction algorithms, see Feature Extraction Algorithms

This section contains the following topics:

8.10.1 Default Behavior of Feature Extraction Node

By default, a Feature Extraction node builds one model using the Non-Negative Matrix Factorization (NMF) algorithm.

If you are connected to Oracle Database 12c or later, then the node builds two models by default:

  • NMF model

  • PCA model

You can add SVD models.

All models in the node use the same build data and have the same case ID, if you specify a case ID.

8.10.2 Create Feature Extraction Node

You create a Feature Extraction node to build feature extraction models. The node uses the Nonnegative Matrix Factorization (NMF) algorithm.

First create a workflow. Then, identify or create a Data Source node.
To create a Feature Extraction node:
  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternatively, press Ctrl+Shift+P to dock the Components pane.
  2. In the Workflow Editor expand Models, and click Feature Extraction.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. Move to the node that provides data for the build. Right-click and click Connect. Drag the line to the Feature Extraction node and click again.
  5. You can edit the node. To edit the node, right-click the node and click Edit. The Edit Feature Extraction Build Node dialog box opens.
  6. The node is ready to build. Right-click the node and click Run.

8.10.3 Data for Model Build

Oracle Data Miner uses heuristic techniques on data for model build.

Oracle Data Miner uses heuristics to:

  • Determine the attributes of the input data used for model build.

  • Determine the mining type of each attribute.

8.10.4 Edit Feature Extraction Build Node

In the Edit Feature Extraction Build Node dialog box, you can specify or change the characteristics of the models to build.

To edit a Feature Build node, either double-click a Feature Build node, or right-click the node and select Edit. The Edit Feature Extraction Build Node dialog box opens. The same dialog box opens when you drop a Feature Build node on a workflow.

The Edit Feature Extraction Build dialog box has three tabs:

8.10.4.1 Build (Feature Extraction)

In the Build tab, you can edit settings related to the Feature Extraction build node.

You can perform the following tasks:

  • Case ID: Specify case ID for Feature Extraction is optional. Specify one by selecting an attribute from the drop-down list.

  • Add Model: To add a model, click The green color plus sign that indicates the option to add.

  • Delete: To delete a model, select the model and click delete.

  • Copy: To copy an existing model, select the model and click duplicate.

8.10.4.1.1 Add Model (Feature Extraction)

In the Add Model dialog box, you can add additional models.

To add a model, click add.

  1. In the Algorithm field, select an algorithm. The default algorithm is NMF.
  2. In the Name field, the default name is displayed. You can accept the default name or change it.
  3. In the Comments field, enter comments, if any. This is an optional field.
  4. Click OK. The model is added to the list. The new model has the same build characteristics as existing models. The new model has the default values for advanced settings.
8.10.4.2 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.10.4.3 Sampling

The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.

By default, Sampling is set to OFF. To set it to ON:

  1. Click ON, and then select:
    • System Determined

    • User Specified and specify the row size

  2. Click OK.
8.10.4.4 Input

The Input tab specifies the input for model build.

Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.

Note:

For R Build nodes, Auto Data Preparation is not performed.

After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.

You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).

8.10.4.5 Text

Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR, or NVARCHAR2.

If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.

If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.

Note:

If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.

To examine or specify text characteristics for machine learning, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.

The Text tab enables you to modify the following:

  • Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is 200.

  • Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:

    • Token (Default): For Token as the transform type, the Default Settings are:

      • Languages: Specifies the languages used in the documents. The default is English. To change this value, select an option from the drop-down list. You can select more than one language.

      • Bigram: Select this option to mix the NORMAL token type with their bigram. For example, New York. The token type is BIGRAM.

      • Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.

        Note:

        If both Bigram and Stemming are selected, then the token type is STEM_BIGRAM. If neither Bigram nor Stemming is selected, then token type is NORMAL.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.
        • Click edit to view the Stoplist Details. This opens the Stoplist Details dialog box.

        • Click add to add a new stoplist. This opens the New Stoplist Wizard.

      • Tokens: Specify the following:
        • Max number of tokens across all rows (document). The default is 3000.

        • Min number of rows (document) required for a token

    • Theme: If Theme is selected, then the Default Settings are as follows:

      • Language: Specifies the languages used in the documents. The default is English. To change this value, select one from the drop-down list. You can select more than one language.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added.

      • Themes: Specifies the maximum number of themes across all documents. The default is 3000.

    • Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.

  • Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.

    You can use the same stoplist for all text columns.

8.10.4.5.1 Stoplist Details

The Stoplist Details dialog box lists the stopwords and stopthemes for the selected Stoplist. You can also add and delete stopwords and stopthemes.

To view the stoplist details, you must select the stoplist first in the Stoplist drop-down list.
  1. To view stopwords and stopthemes:
    • Click stopwords to view the stopwords in the stoplist.
    • Click stopthemes to view the stopthemes in the stoplist.
  2. To add stopwords and stopthemes from features, click condition.
    This opens the Add Stopwords/Stopthemes from Features dialog box.
  3. To add and delete stopwords:
    • Click add to add stopwords or stopthemes to the stoplist. This opens the Add Stopwords/Stopthemes dialog box, where you can type the stopwords or stopthemes separated by comma to add them in the stoplist.
    • Click delete to delete the selected stopword from stoplist.
  4. Click OK.
8.10.4.5.2 Add Stopwords Stopthemes from Features

The Add Stopwords/Stopthemes from Features dialog box allows you to select stopwords or stopthemes from the generated features to be included as new stopwords or stopthemes for the selected stoplist.

To include a stopword or stoptheme:
  1. Select the stopwords or stopthemes that you want to add.
  2. Click OK.

8.10.5 Advanced Settings for Feature Extraction

The options in Advanced Settings for Feature Extraction allows you to inspect and change the data usage and algorithm settings for each model in the node.

You can perform the following:

  • Inspect and change data usage.

  • Change algorithm settings for each model in the node.

To change or view advanced settings, click edit in the Edit Feature Extraction Build Node dialog box. Alternatively, right-click the node and select Advanced Settings. The advanced settings selection enables you to inspect and change the data usage and algorithm settings for each model in the node.

In the upper pane, all models are listed. You can perform the following tasks:

  • Delete: To delete a model, select it and click delete.

  • Add: To add a model, click add.

In the lower pane, you can view or edit the following for the model selected in the upper pane:

8.10.6 Feature Extraction Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternatively, right-click the node and click Go to Properties.

The Feature Extraction node Properties pane has the following sections:
  • Models: Displays the details of model settings. You can edit models here.

  • Build: Displays the case ID for the models defined in this node. All the models in the node have the same case ID. To edit the case ID, select a different attribute from the Case ID list.

  • Partition: Displays the details related to partitioned models. You can add and modify partitioned models here.

  • Details: Displays the details related to the Feature Extraction Build node.

8.10.7 Feature Extraction Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view the context menu options, right click the node. The following options are available in the context menu:

8.11 Model Node

Model nodes rely on database resources for their definition. It may be necessary to refresh a node definition if the database resources change.

For example, if the resources are deleted or re-created. You can specify a model that was built using either of the ODM APIs. The models in a Model node must satisfy the model constraints.

The Model node takes no input. A Model node can be an input to any node that accepts models, such as the Apply and Test nodes, at least for some function types. For example, if a model node contains Classification or Regression models, then it can be input to a test node. Test data must be prepared in the same way that the build data was prepared.

8.11.1 Create a Model Node

A Model node enables you to add models to a workflow that were not built in the workflow.

To add a model node to a workflow and add models to the model node:

  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternatively, press Ctrl+Shift+P to dock the Components pane.
  2. In the Workflow Editor, expand Models, and click Model.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. The Edit Model Selection dialog box opens automatically. The models in the Model node must have the same mining function and the same target (Classification and Regression models only).

Related Topics

8.11.2 Edit Model Selection

In the Edit Model Section dialog box, you can select one or more models to include in the Model node or to remove models from the Model node.

To edit the models in the node, double-click the Model node or right-click the Model node and select Edit.

Note:

All the models in a model node must satisfy the model constraints.

You can perform the following tasks:

  • Select models from the Available Compatible Models list and move them to the Selected Models list using the controls between the lists. The selected models are checked for compatibility. The models in a model node must satisfy the model constraints. The selected models are part of the model node. You can view the models using the Model node properties.

  • Include models from other schemas. To include models, select Include Models from Other Schemas.

  • Filter the Available Compatible Models list in the following ways:

    • Select a model function from the Model Function list. The options are:

      • All

      • Anomaly Detection

      • Association Rules

      • Regression

      • Clustering

      • Feature Extraction

    • Sort the models by name, function, algorithm, target, target data type, creation date, or comments. To sort, click the column header in the list of available models.

  • Add and remove models:

    • Add models by moving them from Available Compatible Models list to the Selected Models list.

    • Remove models by moving them from the Selected Modes list to the Available Compatible Models list. You can also remove models using the Models tab.

8.11.2.1 Model Constraints

A Model node consists of models that are similar. The models in a Model node must satisfy the following;

  • All models must have the same function type (Classification, Regression, Clustering, Anomaly Detection, Association Rules, or Feature Extraction). You cannot include models that have different function types.

    You can add models that are built using different algorithms if the models have the same function type.

  • Classification or Regression models must have the same target attribute. The target attributes must all have the same data type.

    CHAR and VARCHAR2 are considered to be the same data type for Classification models.

  • Classification models must have the same list of target values.

8.11.3 Model Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternatively, right-click the node and click Go to Properties.

In the Model node Properties pane, you can:

  • Add models to the Model node

  • Delete models from the Model node

  • View models in the Model node

The Properties pane for a model node source node has the following sections:

8.11.3.1 Models (Model Node)

The Models section shows the mining function that the models use and lists all the models included in the node in a grid.

You can search for models, add models to the node, and delete models. You can perform the following tasks:

  • Add Models: To add models:

    1. Click add. The Edit Model Selection dialog box opens.

    2. In the Edit Model Selection dialog box, select the models to add to the node. You can add models from other schemas too. However, any models that you add must be compatible with the models already in the node.

    3. Click OK. This adds the models to the node. You can go to the Properties pane for the Model node to view the models.

  • Delete Models: To delete a model, select it and click delete.

  • View Models: To view a model, select it and click view.

  • Refresh models: To refresh models, click refresh. If data on the server changes, then it may be necessary to refresh the node.

8.11.3.2 Details

The Details section displays the node name and comments about the node.

You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.

8.11.4 Model Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

The following options are available in the context menu:

8.12 Model Details Node

The Model Detail node extracts and provides information about the model and algorithms.

The Model Details nodes are the most useful for application developers. The Model Details node performs the following functions:

  • Extracts model details from a Model Build node, a Model node or any node that outputs a model.

  • Reveals information about model attributes and their treatment by the algorithm. The output depends on the type of models selected and the specific type of model details you specify.

  • The output of the Model Details node is a data flow. To enable the data to persist, use a Create Table or View node.

A Model Details node can run in parallel.

This section on Model Detail node contains the following topics:

8.12.1 Model Details Node Input and Output

The input for a Model Details node is either a Build node (any model type) or a Model node.

All models in Build nodes or Model nodes must have the same mining function type. For example, if one is a Classification model, then all of them must be Classification models.

The output for a Model Details node is a data flow based on the model detail specifications. To enable the data to persist, use a Create Table or View node.

8.12.2 Create Model Details Node

The Model Detail node extracts and provides information about the model and algorithms.

To create a Model Details node, follow these steps:

  1. Identify the input node or nodes for model details. The input node must be one or more of the following:
    • Any Model Build node

    • Any Model node

    Note:

    All the models selected must have the same mining function type. For example, if one of the nodes is a Classification node, then all other nodes must build Classification models.

  2. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternatively, press Ctrl+Shift+P to dock the Components pane.
  3. In the Workflow Editor expand Models and click Model Details.
  4. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  5. Move the cursor to the workflow. Right-click one of the input nodes, and select Connect. Drag the link to the Model Details node. Repeat if you must connect more nodes.
  6. The default specification for model details depends on the model. To use the default specification, right-click the Model Details node and click Run.
  7. To change the specification for the Model Details node, right-click the node and select Edit. Alternatively, you can change the specifications in the Properties pane of the node.

8.12.3 Edit Model Details Node

The Model Details Node editor enables you to view or specify the models details provided by the node.

Under the Selected Models section, you can view the models, the nodes, the algorithm and the partition keys. To open Edit Model Details Node, double-click a Model Details node. Alternatively, right-click a Model Details node and select Edit.

You can perform the following tasks:

  • Auto Setting: If this option is selected (the default), then the system determines the specification. You cannot change the output types, algorithm types, or selected models.

  • Function: Displays the function type of the input nodes connected. For example, if a Classification node is connected to Model Details, then the function is Classification. If no input nodes are connected, then it is undefined.

  • Model Type: Displays the list of algorithms available, including All. Select a model type.

  • Output: Select an output type for the Model Details of the algorithm. The options available are:

    • If you select All or O-Clusters in the Model Type field, then available output types are
      • Attribute Histogram

      • Centroid

      • Centroid Scoring (available only for K-Means)

      • Full Tree

      • Model Signature

      • Rules

    • If you select Expectation Maximization, then available output types are:

      • Attribute Gaussian Distribution

      • Attribute Histogram

      • Centroid

      • Component Bernoulli Distribution

      • Component Clusters

      • Component Priors

      • Components

      • Full Tree

      • Global Details

      • Model Signature

      • Projections

      • Rules

    • If you select R Extensible, then available output types are:

      • Model Signature

      • R Model Details

  • Column: Click Columns to view the list of the columns (name and data type) for the selected output type.

  • Add: To add model type or edit output type, deselect Automatic Specification. To add another model type, select the model type and click add. The Edit Model Details Node dialog box opens. You can accept the default specifications or edit them.

8.12.3.1 Edit Model Selection Details

The Edit Model Selection Details provide generic information related to the mining function, model type, output type, available compatible models and selected models in two sections.

The top pane of the Edit Model Selection Details dialog box contains general information:

  • Function: Displays the function type of the input nodes connected. For example, if a Classification node is connected to Model Details, then the function is Classification. If no input nodes are connected, then it is undefined.

  • Model Type: Displays algorithms. If there are models already selected (listed in the Selected Models grid), then the Model Type field is disabled to match the already selected models. If you move all models out of the Selected Models grid, then the Model Type field is enabled again. If the Model Type is enabled, then you can select models. The default is All Models.

  • Output Type: Displays the list of possible output types (model queries) that are available for the specified model types. The values for each algorithm selection are as follows:

    • Decision Tree (initial default): Full Tree (default), Full Tree XML, Leaf Nodes, Model Signature

    • SVM Classification: Coefficients (Default), Model Signature

    • SVM Regression, Coefficients (Default), Model Signature

    • Naive Bayes: Pair Probabilities (Default), Model Signature

    • Association Rules: Rules (Default), Global Details, Itemsets

    • Anomaly Detection: Coefficients (Default), Model Signature

    • GLM Classification: Statistics (Default), Row Diagnostics, Model Signature, Global Details

    • GLM Regression: Statistics (Default), Row Diagnostics, Model Signature, Global Details

    • KM or OC Clustering: Full Tree (Default), Rules, Attribute Histograms, Centroid, Model Signature

    • Expectation Maximization (EM): Full Tree (Default), Attribute Histograms, Centroid Components, Global Details, Model Signature, Projections, Rules.

      EM requires Oracle Database 12c Release 1 (12.1) or later.

    • NMF: Features Transactional (Default), Model Signature

    • SVD: Features Transactional (Default), Global Details, Model Signature, Projections, Singular Values

      SVD requires Oracle Database 12c Release 1 (12.1) or later.

    • PCA: Features Transactional (Default), Eigen Values, Global Details, Model Signature, Projections

      PCA requires Oracle Database 12c Release 1 (12.1) or later.

Output values are also available for multiple model types. For example, you can select Centroid for all clustering models.

  • Columns: Click to see a list of the columns (name and data type) for the selected output type.

The lower section of the dialog box displays the following:

  • Available Compatible Models: Lists the available models, that is, models that match the algorithm selection. The grid, for each model, displays the model Name, the input node for the model, and the algorithm used to build the model.

  • Selected Models: Lists the selected models. The grid, for each model, displays the model name, the input node for the model, and the algorithm used to build the model.

8.12.4 Model Details Automatic Specification

The Automatic Specification setting determines how specifications change automatically.

  • By default, Automatic Specification is set to ON or selected. If Automatic Specification is set to ON, then it results in the following behavior:

    • When the first input node is connected to a Model Details node, the input node is searched for models in a default order of priority. For the first model type found, all the nodes matching models are added to the Model Details Specification along with the default Output Type.

    • On subsequent connections, the models that match the type in the Model Details node are automatically added. A message is displayed telling you that models are being added automatically.

    • When an input node is disconnected, all model specifications provided by that node are automatically removed from the Model Details node.

    • When an input node is edited, any models added are automatically added to the Model Details node if the added model matches the model type contained in the node. If models are deleted from an input node, then they are deleted from the Model Details node.

    • When a parent node is edited so that all models are removed, the model node is set to undefined. When a new model is added to the parent node, the model node remains undefined because it is too unpredictable about what model and output type would be selected by default given that there may be many parent nodes connected to a model node.

    • When an input node is edited and the model is changed so that it is no longer consistent with its specification in the model details node, the model specification is removed.

  • If Automatic Specification is Off or deselected, then it results in the following behavior:

    • Models are not added automatically.

    • You must edit the Model Details node.

    • Validations are performed as usual, so models that are now inconsistent or missing are marked as invalid. Also, if models are missing and a node is added that contains a match with that model, then it is made valid and associated to the new node.

    • You must manually fix or remove invalid model references.

Related Topics

8.12.4.1 Default Model and Output Type Selection

The specification that is automatically added depends on the mining function of the model.

The mining function of the model are as follows:

  • Classification

    • Decision Tree: Full Tree

    • GLM: Statistics

    • NB: Probabilities

    • SVM: LINEAR KERNEL ONLY Coefficients

  • Clustering

    • KM: Full Tree

    • OC: Full Tree

    • EM: Full Tree

  • Regression

    • GLM: Statistics

    • SVM: LINEAR KERNEL ONLY Coefficients

  • Anomaly Detection

    • SVM: LINEAR KERNEL ONLY Coefficients

  • Association

    • Apriori: Rules

  • Feature Extraction

    • NMF, SVD, or PCA: Features transactional

8.12.5 Model Details Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternatively, right-click the node and click Go to Properties.

Model Details node Properties has the following sections:

Related Topics

8.12.5.1 Models (Model Details)

The Models section lists the models that you want to save details about.

You can add and remove models from the list.

8.12.5.2 Output (Model Details)

The Output tab lists the columns produced by the Model Details node.

For each column, the alias (if any) and the data type are displayed.

8.12.5.3 Cache (Model Details)

You can generate cache. If you generate cache, then you can specify the sampling size.

The default is to not generate cache to optimize the viewing of results. The default sampling size is 2000 rows.

8.12.5.4 Details

The Details section displays the node name and comments about the node.

You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.

8.12.6 Model Details Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view the context menu options, right click the node. The following options are available in the context menu:

Related Topics

8.12.6.1 View Data (Model Details)

After a model is built and run successfully, you can view the data contained in the model using the View Data option.

To view the complete Model Details output, right-click the node and select View Data.

The output is displayed in the following tabs:

  • Data: The data that constitutes the model details. What the data represents depends on the model. For example, the data could represent a tree or rules. You can sort and filter the columns of this tab.

  • Columns: Data Type and Mining Type of the columns in the output.

  • SQL: SQL used to generate the model details.

8.12.7 Model Details Per Model

The exact data displayed in a Model Details node depends on the particular models.

All models that can be applied (scored) can have model signature as output.

8.13 R Build Node

The R Build Node allows you to register R models. It builds R models and generates R model test results for Classification and Regression mining function. R Build nodes supports Classification, Regression, Clustering, and Feature Extraction mining functions only.

You must have Oracle Machine Learning for R installed in the host to build R models.

Note:

The R Model is visible only when Oracle SQL Developer is connected to Oracle Database 12.2 and later.

8.13.1 Create R Build Node

Create a R Build Node to register R models.

Identify an input node. The input node can be any node that provides data as inputs. Depending on the mining function, the R Build node can also accept a test data source node. This is available only for Classification and Regression mining functions.
To create a R Build node:
  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternatively, press Ctrl+Shift+P to dock the Components pane.
  2. In the Workflow Editor, expand Models, and click R Extensible.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. Right-click the input node and click Connect. Drag the line to the R Build node. You can also create an additional connection from a Test node for Classification and Regression mining function.
    This connect the input node to the R Build node.

8.13.2 Edit R Build Node

The Edit R Build Node dialog box allows you to edit settings related to the R Model.

The dialog box comprises the following tabs:

8.13.2.1 Build

In the Build tab enables you to specify or change the characteristics of the models to build.

To edit the characteristics of the model to build, follow these steps:
  1. The Function field displays the supported mining functions, which are Classification, Regression, Clustering and Feature Extraction.
  2. The Target is enabled only for Classification and Regression models only.
  3. In the Case ID field, select a Case ID.

    Note:

    A case ID is not required. However, if you do not specify a case ID, then the processing will be slower.
  4. Additionally, you can perform the following tasks in the Build tab:
    • Add Model: To add models, click add

    • Delete Model: To delete any model, select the model and click delete

    • Edit Model: To edit a model, click edit

    • Duplicate Model: To copy an existing model, select the model and click copy

  5. Click OK.
8.13.2.1.1 Add Model (R Build Node)

You must provide R functions that are compatible with Oracle Machine Learning Extensible framework. Otherwise runtime errors may result.

Note:

The required R functions must be registered using the script rqScriptCreate in Oracle Machine Learning for R. For more information about the procedure, see .
To add a model to the R Build node, provide the following details:
  1. Name: This is the name of the model.
  2. Build Function: Lists all registered R functions. Select the correct R functions to be used for the build process. Click Edit to open the Build Function dialog box.
  3. Score Function: Lists all registered R functions. Select the correct R function to be used for scoring. Click Edit to open the Score Function dialog box.
    Scoring function is optional. If you do not provide the scoring function, then scoring results will not be available and the nodes that depend on scoring will not recognize the model as valid.
  4. Model Details Function: This is an optional function. The Model Details function generates the output in the R Node model viewer in the Details tab. The Model Details node displays the data only if the model details function is provided. Click Edit to open the Model Details Function dialog box.
  5. Click OK.
8.13.2.1.1.1 Build Function

In the Build Function dialog box, you can select any registered R function to be used for the build function.

  1. The Build Function field displays the applicable R build function. You can select another function from the drop-down list.
  2. The Function Definition field displays the code of the selected function. You can verify the function here. You can specify algorithm settings to be passed on to the build function.
  3. Click Settings. This opens the Build Settings dialog box where you can specify values for parameters used in the build function.
  4. Click OK.

Related Topics

8.13.2.1.1.2 Build Settings

The Build Settings dialog box allows you to specify the required settings with names, values, and data types. The names must match the argument names in the R function. The data types can be either NUMBER or STRING.

  1. Select Specify Row Weight Column and select an option from the drop-down list. The option is enabled for Generalized Linear Models (GLM), that includes Classification and Regression models only.
  2. In the Settings section:
    • Click add to add a setting.

    • Select a setting and click The red color cross sign that indicates the option to delete. to delete the selected setting.

  3. Click OK.
8.13.2.1.1.3 Score Function

In the Score Function dialog box, you can select a registered R function to be used for scoring.

  1. The Score Function field displays the applicable R build function. You can select another function from the drop-down list.

    Note:

    If the scoring function is not specified, then the R model will not be available for Test and Apply operations.
  2. The Function Definition field displays the code of the selected function. You can verify the function here.
  3. In the Weight Function field, select an applicable R Weight function from the drop-down list. This is required for Prediction Details.
  4. The Function Definition field displays the details of the selected R weight function.
  5. Click OK.
8.13.2.1.1.4 Model Details Function

In the Model Details Function dialog box, you can select a registered R function.

  1. In the Model Details Function field, select the R function as applicable. If you do not specify the Model Details function, then the Details tab in the Model Viewer will not be available.
  2. The Function Definition section displays the code of the selected R function. You can verify the function here. The selected model detail function generates a data frame that is persisted to a view, after the model is built.
  3. In the Output Column section, you must specify the output signature of the function. The output signature of the function should match the data frame object generated by the function. For example, if you select a R function that produces an output two columns ATTRIBUTE and COEFFICIENTS, then the column data types can be either NUMBER or VARCHAR2. Internally, Oracle Data Miner will construct a SELECT statement from the specified name-value pairs to be passed to the R Model Details function using the ODM extensible framework.
  4. Click OK.
8.13.2.2 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.13.2.2.1 Add Partitioning Columns

Partitioning columns result in building a virtual model for each unique partition. Because the virtual model uses data only from a specific partition, it can potentially predict cases more accurately than if you did not select a partition.

In addition to selecting attributes, you can specify partitioning expressions. Partitioning expressions are concatenated and the result expression is the same for all predictive functions.

  1. Select one or more attributes in the Available Attributes list to serve as partitions.
  2. Move the selected columns to the Selected Attributes list using the arrows.
  3. Click OK. The attributes are moved to the Partition list.

Optionally, you can add partitioning expressions.

8.13.2.3 Input

The Input tab specifies the input for model build.

Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.

Note:

For R Build nodes, Auto Data Preparation is not performed.

After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.

You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).

8.13.2.4 Sampling

The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.

By default, Sampling is set to OFF. To set it to ON:

  1. Click ON, and then select:
    • System Determined

    • User Specified and specify the row size

  2. Click OK.
8.13.2.5 Text

Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR, or NVARCHAR2.

If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.

If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.

Note:

If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.

To examine or specify text characteristics for machine learning, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.

The Text tab enables you to modify the following:

  • Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is 200.

  • Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:

    • Token (Default): For Token as the transform type, the Default Settings are:

      • Languages: Specifies the languages used in the documents. The default is English. To change this value, select an option from the drop-down list. You can select more than one language.

      • Bigram: Select this option to mix the NORMAL token type with their bigram. For example, New York. The token type is BIGRAM.

      • Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.

        Note:

        If both Bigram and Stemming are selected, then the token type is STEM_BIGRAM. If neither Bigram nor Stemming is selected, then token type is NORMAL.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.
        • Click edit to view the Stoplist Details. This opens the Stoplist Details dialog box.

        • Click add to add a new stoplist. This opens the New Stoplist Wizard.

      • Tokens: Specify the following:
        • Max number of tokens across all rows (document). The default is 3000.

        • Min number of rows (document) required for a token

    • Theme: If Theme is selected, then the Default Settings are as follows:

      • Language: Specifies the languages used in the documents. The default is English. To change this value, select one from the drop-down list. You can select more than one language.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added.

      • Themes: Specifies the maximum number of themes across all documents. The default is 3000.

    • Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.

  • Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.

    You can use the same stoplist for all text columns.

8.13.3 Advanced Settings (R Build Node)

The Advanced Settings dialog box allows you to view and edit model settings related to data usage, Extensible settings, and configuration of the previously defined R functions such as build function, scoring function, and model details function.

You can perform the following tasks:

  • Add Model: Click add to add a model.

  • Delete Model: Select a model and click The red color cross sign that indicates the option to delete..

8.13.4 R Build Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternatively, right-click the node and click Go to Properties.

The R Build node Properties pane has the following sections:

8.13.5 R Build Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

The following options are available in the context menu:

8.14 Regression Node

The Regression node defines one or more Regression models to build and to test.

To specify data for the build, connect a Data Source node to the Regression node. You can also connect a second data source to the Regression build node to specify test data. You can only specify one target. A Regression build can run in parallel.

The models in a Regression Node all have the same target and case ID.

There are two ways to make regression predictions:

  • By building and testing a Regression model: Use a Regression node, and then apply the model to new data to make classifications.

  • By using a Prediction Query, which is one of the predictive queries.

To learn about clustering use cases, model building and algorithm selection, see Regression Algorithms

This section consists of the following topics:

8.14.1 Default Behavior for Regression Node

For a binary target, the Regression node builds four models.

The models are built using the following algorithms:

  • Generalized Linear Model (GLM)

  • Support Vector Machine (SVM)

The models have the same build data and the same target.

By default, the models are all tested. The test data is created by randomly splitting the build data into a build data set and a test data set. The default ratio for the split is 60 percent build and 40 percent test. When possible Data Miner uses compression when creating the test and build data sets.

You can instead use all the build data as test data.

To use separate test data, connect a test data source to the Build node or use a Test node.

After you test models, you can view test results.

You can compare test results for two or more Regression models using the Compare Test Results selection of the context menu.

The case ID is optional. However, if you do not specify a case ID, then the processing will be slower.

8.14.2 Create a Regression Node

By default, a Regression node builds two models, one each using General Linear Model (GLM) and Support Vector Machine (SVM) algorithm.

Before creating a Regression node, first, create a workflow. Then, identify or create a data source.
To create a Regression node and attach data to it:
  1. In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternatively, press Ctrl+Shift+P to dock the Components pane.
  2. In the Workflow Editor. expand Models, and click Regression.
  3. Drag and drop the node from the Components pane to the Workflow pane.
    The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
  4. Move to the node that provides data for the build. Right-click, and click Connect. Drag the line to the Regression node and click again.
  5. The Edit Regression Build Node dialog box opens. You must specify a target (all models in the node have the same target). A target cannot be text.
  6. To specify a separate Data Source node for test, connect a second Data Source node to the Build node. This is optional.
  7. After you finish editing the node, and connecting the optional test Data Source node, the node should be ready to build. Right-click the node and click Run.

    If you specified a test Data Source node when the node runs, then the connection from the build data source is labeled Build and the connection from the test data source is labeled Test.

8.14.3 Data for Model Build

Oracle Data Miner uses heuristic techniques on data for model build.

Oracle Data Miner uses heuristics to:

  • Determine the attributes of the input data used for model build.

  • Determine the mining type of each attribute.

8.14.4 Edit Regression Build Node

In the Edit Regression Build Node dialog box, you can edit settings related to the model build, model partition, sampling, inputs, text settings and so on.

To open the Edit Regression Build Node dialog box, double-click a Regression Build node, or right-click a Regression Build node and select Edit.

The Edit Regression Build Node dialog box contains the following tabs:

8.14.4.1 Build

The Build tab enables you to specify or change the characteristics of the models to build.

To edit the characteristics of the model to build, follow these steps:

  1. In the Target field, select a target from the drop-down list. The list consist of attributes from the table or view specified in the Data Source node that is connected to the build node.

    You must specify a target. All models in the node have the same target.

  2. In the Case ID field, select one attribute from the drop-down list. This attribute must uniquely identify a case.

    Note:

    A case ID is not required. However, if you do not specify a case ID, then the processing will be slower.

    A case ID is required to generate GLM diagnostics.

    If you specify a case ID, then all models in the node have the same case ID.

  3. In the Model Settings list, select which models you want to build. You can build Support Vector Machine (SVM) and Generalized Linear Models (GLM). You can delete any of these models by selecting the model and clicking
    • To delete any model, select the model and click delete.

    • To add models, click add.

    • To edit a model, click edit.

    • To copy an existing model, select the model and click copy

  4. Click OK.

The default is to test the model using a test data set created by splitting the build data set. If you do not want to test the model in this way, then go to the Test section in the Regression node Properties pane. You can also use a Test Node and a test Data Source node to test the model instead.

8.14.4.1.1 Add Model (Regression)

In the Add Model dialog box, you can add a model to the node, and select an algorithm for it.

To add a model to the node:

  1. In the Algorithm field, select an algorithm.
  2. In the Name field, a default name is displayed. You can use the default or rename the model.
  3. In the Comment field, add comments if any. This is an optional field.
  4. Click OK. The new model is added to the node.
8.14.4.2 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.14.4.3 Sampling

The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.

By default, Sampling is set to OFF. To set it to ON:

  1. Click ON, and then select:
    • System Determined

    • User Specified and specify the row size

  2. Click OK.
8.14.4.4 Input

The Input tab specifies the input for model build.

Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.

Note:

For R Build nodes, Auto Data Preparation is not performed.

After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.

You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).

8.14.4.5 Text

Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR, or NVARCHAR2.

If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.

If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.

Note:

If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.

To examine or specify text characteristics for machine learning, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.

The Text tab enables you to modify the following:

  • Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is 200.

  • Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:

    • Token (Default): For Token as the transform type, the Default Settings are:

      • Languages: Specifies the languages used in the documents. The default is English. To change this value, select an option from the drop-down list. You can select more than one language.

      • Bigram: Select this option to mix the NORMAL token type with their bigram. For example, New York. The token type is BIGRAM.

      • Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.

        Note:

        If both Bigram and Stemming are selected, then the token type is STEM_BIGRAM. If neither Bigram nor Stemming is selected, then token type is NORMAL.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.
        • Click edit to view the Stoplist Details. This opens the Stoplist Details dialog box.

        • Click add to add a new stoplist. This opens the New Stoplist Wizard.

      • Tokens: Specify the following:
        • Max number of tokens across all rows (document). The default is 3000.

        • Min number of rows (document) required for a token

    • Theme: If Theme is selected, then the Default Settings are as follows:

      • Language: Specifies the languages used in the documents. The default is English. To change this value, select one from the drop-down list. You can select more than one language.

      • Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is Default, then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added.

      • Themes: Specifies the maximum number of themes across all documents. The default is 3000.

    • Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.

  • Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.

    You can use the same stoplist for all text columns.

8.14.5 Advanced Settings for Regression Models

In the Advanced Settings dialog box, you can add models, delete models, review settings, and change settings related to the model and algorithm.

The Advanced Settings dialog box enables you to:

  • Inspect and change data usage and algorithm settings for each model in the node

  • Add and delete models

To change or view Advanced Settings, click edit in the Edit Regression Build Node dialog box. Alternatively, right-click the node and select Advanced Settings.

The upper panes lists all the models in the node. You can perform the following functions:

  • Delete: To delete a model, select the model and click delete

  • Add: To add a model, click add. The Add Model dialog box opens.

In the lower pane, you can view and modify data usage and algorithm settings for the model selected in the upper pane. You can edit the following:

8.14.6 Regression Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternatively, right-click the node and click Go to Properties.

Before building Regression models, ensure the following:

  • Specify a Target.

  • Specify a case ID. This is optional. However, if you do not specify a case ID, then the processing will be slower.

This section contains the following topics:

8.14.6.1 Models (Regression)

The Model section lists the models that are built.

By default, three Regression models are built using three different algorithms (SVM, NB, and DT). You can also specify the GLM algorithm if you add a model.

You can perform the following tasks:

  • Delete: To delete a model, select the model and click delete.

  • Add: To add a model, click add

  • Compare Test Results: If models were tested, then you can compare test results. Select two or more models and click compare

  • View Models: If a model built successfully, then you can view the model. Select the model and click view. The corresponding viewer opens.

  • Indicate Model Status: Indicates whether models are passed to subsequent nodes.

8.14.6.1.1 Output Column

The Output column in the Model Settings grid controls the passing of models to subsequent nodes. By default, all models are passed to subsequent nodes.

  • To ignore a model, that is, to not pass it to subsequent nodes, click pass. The icon changes to ignore, the Ignore icon.

  • To cancel the ignore, click the Ignore icon again. It changes to the Output icon.

8.14.6.2 Build (Regression)

The Build section displays information related to the selected target and the Case ID.

The Build section displays the following:

  • Target: The Build node must be connected to a Data Source node. You then select the target from the target list. To change the target, select a different target from the drop-down list.

  • Case ID: Select an attribute from the drop-down list. This attribute must uniquely identify a case. The case ID is optional. If no case ID is selected, then None is displayed. However, if no case ID is specified, then the processing will be slower.

8.14.6.3 Partition

In the Partition tab, you can build partitioned models.

  • In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.

  • Click Advanced Settings to set and select the type of partition build.

  • To add columns for partitioning, click add.

    Note:

    Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.

  • To remove a partitioning column, select the columns and click delete.

  • To move a column to the top, click move to the top.

  • To move a column up, click moveup

  • To move a column down, click move down

  • To move a column to the bottom, click move to the bottom

8.14.6.4 Details

The Details section displays the node name and comments about the node.

You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.

8.14.6.5 Test (Regression)

The Test section specifies the data used for testing and the tests performed.

By default, all models that are built using test data are tested. The test data is created randomly splitting the build data.

The following settings are available in the Test section:

  • Perform Test: By default, all models that are built using test data are tested. The test data is created randomly splitting the build data. The default test results are:

    • Performance Metrics

    • Residuals

      You can deselect both.

  • Test Data: Test Data is created is one of the following ways:

    • Use all of the Mining Build Data for Testing

    • Use Split Build Data for TestingSplit for Test (%) Create Split as: View (default). The split creates a view that is not parallel.

    • Use a Test Data Source for Testing: Select this option to provide a separate test Data Source and connect the test data source to the build node after you connect the build data. Alternatively, you can test a model by using a Test node.

8.14.7 Regression Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view the context menu options, right click the node. The following options are available in the context menu:

8.15 Advanced Settings Overview

The Advanced Settings dialog box enables you to edit data usage and other model specifications, add and remove models from the node.

You can open the Advanced Settings dialog box in one of these ways:

  • Right-click any model node and click Advanced Settings from the context menu.

  • Double-click the node to open the editor. Then click edit.

The dialog box has two panes, as illustrated in this example of Advanced Settings for a Classification Build node:

In the upper pane of the Advanced Settings, you can delete and add models. You can also select models in the upper pane to change data usage. In the lower pane of the Advanced Settings, which has one, two, or three tabs, you can edit model specifications.

8.15.1 Upper Pane of Advanced Settings

The upper pane of Advanced Settings lists all of the models in the node.

The Model Settings grid provides the following information about each model:

  • Model Name

  • Algorithm

  • Date of Last Build

  • Auto

  • Data Usage

  • Column Excluded By...

To view the input and mining type for attributes, select the model in the upper pane and deselect Auto. If Auto is selected (the default), then the system automatically determines the attributes used to build the model.

Oracle Data Miner does not necessarily select all attributes to use in building a model. For example, if most of values of an attribute are the same, then the attribute is not selected.

To see which attributes are selected, deselect Auto. Select a model. The lower pane indicates the selected attributes with a check mark in the Input column.

If Auto is not selected, then you can override the system's choices in the Data Usage tab. If Auto is not selected, then you can also view input and mining type. This enables you to see which attributes are used for model build, and to change them, if necessary.

The Model Settings grid enables you to delete or add models to the node.

  • Delete: To delete a model, select the model and click delete.

  • Add: To add a model to the node, click add. The Add Models dialog box for the node opens. In the Add Model dialog box, select an algorithm, either accept the default name or specify a different name, and add optional comments.

8.15.2 Lower Pane of Advanced Settings

The lower pane of Advanced Settings displays information related to data usage, algorithm settings, and performance settings.

Select a model in the upper pane. The related information is displayed in the following tabs:

  • Data Usage: For all models except Association

  • Algorithm Settings: For all models

  • Performance Settings: For Classification models only

These tabs display the specification used to build the selected model. You can change the specification.

8.15.2.1 Data Usage

The Data Usage tab contains the data grid that lists all attributes in the data source.

The Data Usage tab is not supported for the Association node. To modify any values, to see which attributes are not used as input, or to see mining types, select View in the lower pane.

You can change data usage information for several models at the same time. For each attribute, the grid lists displays the following:

  • Attributes: This is the name of the attribute.

  • Data Type: This is the Oracle Database data type of the attribute.

  • Input: Indicates if the attribute is used to build the model. To change the input type, click Automatic. Then click the icon and select the new icon. For models that have a target, such as Classification and Regression models, the target is marked with a red target icon.

    • The pass icon indicates that the attribute is used to build the model.

    • The ignore icon indicates that the attribute is ignored, that is, it is not used to build the model.

  • Mining Type: This is the logical type of the attribute, either Numerical (numeric data), Categorical (character data), nested numerical, or nested categorical, text or custom text. If the attribute has a type that is not supported for mining, then the column is blank. Mining type is indicated by an icon. Move the cursor over the icon to see what the icon represents. To change the mining type, click Automatic and then click the type for the attribute. Select a new type from the list. You can change mining types as follows:

    • Numerical can be changed to Categorical. Changing to Categorical casts the numerical value to string.

    • Categorical.

    • Nested Categorical and Nested Numerical cannot be changed.

  • Auto Prep: If Auto Prep is selected, then automatic data preparation is performed on the attribute. If Auto Prep is not selected, then no automatic data preparation is performed for the attribute. In this case, you are required to perform any data preparation, such as normalization, that may be required by the algorithm used to build the model. No data preparation is done (or required) for target attributes. The default is to perform automatic data preparation.

  • Rules: After a model runs, Rules describe the heuristics used. For details, click Show.

There are two types of reasons for not selecting an attribute as input:

  • The attribute has a data type that is not supported by the algorithm used for model build.

    For example, O-Cluster does not support nested data types such as DM_NESTED_NUMERICALS. If you use an attribute with type DM_NESTED_NUMERICALS to build a O-Cluster model, then the build fails.

  • The attribute does not provide data useful for mining. For example, an attribute that has constant or nearly constant values.

    If you include attributes of this kind, then the model has lower quality than if you exclude them.

8.15.2.2 Algorithm Settings

The Algorithm Settings section displays the values of algorithm settings.

The settings are determined by the algorithm used to build the model.

8.15.2.3 Performance Settings

The performance settings are available for Classification models only.

The Performance Settings tab defines the performance objective for Classification model build. To view or change performance settings for a model, select the model in the upper pane. Weights are listed in the Weights grid. Select one of these settings:

  • Balanced: (default) Attempts to achieve the best overall accuracy across all the target class values. This is done in different ways depending on the algorithm selected. Generally, it requires the model build process to be biased using weight values that provide extra weight to target values that occur less frequently.

  • Natural: Enables the model to build without any bias, so that the model uses its natural view of the data to build an accurate model. In this case, rare target class values are probably not going to be predicted as frequently as they would predict the model that was built using the balanced option.

  • Custom: Enables you to enter a set of weights for each target value. One way to get started defining custom weights is to click Balanced or Natural , just above the Weights grid. Either of these options generate weights similar to those that would result in either Balanced or Natural performance. You can then change these weights to different values.

To save the values, click OK.

Related Topics

8.16 Machine Learning Functions

Machine Learning functions represent a class of machine learning problems that can be solved using machine learning algorithms.

When creating a machine learning model, you must first specify the machine learning function and then choose an appropriate algorithm to implement the function if one is not provided by default.

Oracle Machine Learning supports these machine learning functions:

8.16.1 Classification

Classification is a machine learning function that assigns items in a collection to target categories or classes, that is, items are classified according to target categories.

fication The goal of classification is to accurately predict the target class for each case in the data. For example, a Classification model could be used to identify loan applicants as low, medium, or high credit risks.

The target categories for a classification are discrete and not ordered. The simplest type of classification problem is binary classification. In binary classification, the target attribute has only two possible values: for example, high credit rating or low credit rating. Multiclass targets have more than two values: for example, low, medium, high, or unknown credit rating.

The following topics describe the classification:

8.16.1.1 Building Classification Models

A Classification model is built from historical data for which the classifications are known.

To build (train) a Classification model, a classification algorithm finds relationships between the values of the predictors and the values of the target. Different classification algorithms use different techniques for finding relationships. These relationships are summarized in a model. The model can then be applied to a different data set in which the class assignments are unknown.

Algorithm settings control model build. Settings depend on the algorithm.

Use a Build Node to build one or more Classification models.

Classification models are tested by default.

8.16.1.2 Comparing Classification Models

You can compare Classification models by comparing the test metrics of the respective models.

8.16.1.3 Applying Classification Models

Scoring or applying a Classification model results in class assignments and the probability that the assignment is the correct one.

For example, a model that classifies customers as low, medium, or high value would also predict the probability that the classification is correct.

Use an Apply node to score a Classification model, that is to apply the model to new data.

Related Topics

8.16.1.4 Classification Algorithms

Decision Tree algorithm, Naive Bayes algorithm, and Generalized Linear Model algorithms are used for classification.

  • Decision Tree algorithm automatically generates rules, which are conditional statements that reveal the logic used to build the tree.

  • Naive Bayes algorithm uses Bayes' Theorem, a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data.

  • Generalized Linear Models (GLM) algorithm is a popular statistical technique for linear modeling. Oracle Machine Learning implements GLM for binary classification and for regression.

    GLM provides extensive coefficient statistics and model statistics, and row diagnostics. GLM also supports confidence bounds, which are the upper and lower boundaries of an interval in which the predicted value is likely to lie.

  • Support Vector Machine (SVM) algorithm is a powerful, state-of-the-art algorithm based on linear and non-linear regression. Oracle Machine Learning implements SVM for binary and multiclass classification.

    Oracle Machine Learning implements SVM for binary and multiclass classification.

8.16.2 Regression

Regression is a machine learning function that predicts a number. Profit, sales, mortgage rates, house values, square footage, temperature, or distance could all be predicted using regression techniques.

For example, a Regression model could be used to predict the value of a house based on location, number of rooms, lot size, and other factors.

This section on Regression contains the following topics:

Regression models are tested by default.

Related Topics

8.16.2.1 Building Regression Models

Use a Build Node to build one or more Regression models.

Algorithm settings control the model build. Settings depend on the algorithm.

A Regression task begins with a data set in which the target values are known. For example, a Regression model that predicts house values could be developed based on observed data for many houses over a period of time. In addition to the value, the data might track the age of the house, square footage, number of rooms, taxes, school district, proximity to shopping centers, and so on. House value would be the target, the other attributes would be the predictors, and the data for each house would constitute a case.

In the model build (training) process, a regression algorithm estimates the value of the target as a function of the predictors for each case in the build data. These relationships between predictors and target are summarized in a model, which can then be applied to a different data set in which the target values are unknown.

8.16.2.2 Applying Regression Models

Scoring, or applying, a Regression model results in class assignments and the probability that the assignment is correct for each case.

For example, a model that predicts a value for each cased also predicts the probability that the value is correct.

Use an Apply node to score a Regression model, that is to apply the model to new data.

Related Topics

8.16.2.3 Regression Algorithms

Oracle Machine Learning supports Generalized Linear Models (GLM) and Support Vector Machines (SVM) for Regression.

  • Generalized Linear Models (GLM) algorithm is a popular statistical technique for linear modeling. Oracle Machine Learning implements GLM for binary classification and for regression.

    GLM provides extensive coefficient statistics and model statistics, and row diagnostics. GLM also supports confidence bounds.

  • Support Vector Machines (SVM) algorithm is a powerful, state-of-the-art algorithm based on linear and non-linear regression.

    SVM regression supports two kernels: the Gaussian Kernel for non-linear regression, and the Linear Kernel for linear regression. SVM also supports active learning.

8.16.3 Anomaly Detection

Anomaly Detection (AD) identifies cases that are unusual within data that is apparently homogeneous.

Standard classification algorithms require the presence of both positive and negative examples (counterexamples) for a target class. One-Class Support Vector Machine (SVM) classification requires only the presence of examples of a single target class.

The model learns to discriminate between the known examples of the positive class and the unknown negative set of counter examples. The goal is to estimate a function that is:
  • Positive, if an example belongs to a set, and

  • Negative or zero, if the example belongs to the complement of the set

Note:

Solving a one-class classification problem can be difficult. The accuracy of one-class classifiers cannot usually match the accuracy of standard classifiers built with meaningful counterexamples.

This section about Anomaly Detection models contains the following topics:

8.16.3.1 Building Anomaly Detection Models

Oracle Machine Learning uses SVM as the one-class classifier for Anomaly Detection (AD).

When SVM is used for Anomaly Detection, it has the Classification machine learning function but no target.

To build an AD model, use an Anomaly Detection node connected to an appropriate data source.

8.16.3.2 Applying Anomaly Detection Models

Oracle Machine Learning uses Support Vector Machine (SVM) as the one-class classifier for Anomaly Detection (AD). When a one-class SVM model is applied, it produces a prediction and a probability for each case in the scoring data.

  • If the prediction is 1, then the case is considered typical.

  • If the prediction is 0, then the case is considered anomalous.

This behavior reflects the fact that the model is trained with normal data.

8.16.4 Clustering

Clustering finds natural groupings of data objects, that is objects that are similar in some sense to one another.

The members of a cluster are more like each other than they are like members of other clusters. The goal of clustering analysis is to find high-quality clusters such that the inter-cluster similarity is low and the intra-cluster similarity is high.

The following topics discuss clustering:

8.16.4.1 Using Clusters

You can use Clustering to segment data, to explore data, and also for anomaly detection.

Like Classification, use Clustering to segment data. Unlike Classification, Clustering models segment data into groups that were not previously defined. Classification models segment data by assigning it to previously defined classes, which are specified in a target. Clustering models do not use a target.

Clustering is useful for exploring data. If there are many cases and no obvious groupings, then you can use clustering algorithms to find natural groupings. Clustering can also serve as a useful data preprocessing step to identify homogeneous groups on which to build supervised models.

Clustering can also be used for anomaly detection. After the data has been segmented into clusters, you might find that some cases do not fit well into any clusters. These cases are anomalies or outliers.

Clusters are not necessarily disjoint; an item can be in several clusters.

8.16.4.2 Calculating Clusters

Oracle Machine Learning performs hierarchical clustering.

The leaf clusters are the final clusters generated by the algorithm. Clusters higher up in the hierarchy are intermediate clusters.

8.16.4.3 Algorithms for Clustering

Oracle Machine Learning supports these algorithms for clustering:

8.16.5 Association

Association rules express the relationships between items that take place at the same time.

Association rules are often used to analyze sales transactions. For example, it might be noted that customers who buy cereal at the grocery store often buy milk at the same time. In fact, association analysis might find that 85 percent of the checkout sessions that include cereal also include milk.

This application of association modeling is called market-basket analysis. It is valuable for direct marketing, sales promotions, and for discovering business trends. Market-basket analysis can also be used effectively for store layout, catalog design, and cross-sell.

Association modeling has important applications in other domains as well. For example, in e-commerce applications, association rules may be used for web page personalization. An association model might find that a user who visits pages A and B is 70 percent likely to also visit page C in the same session. Based on this rule, a dynamic link could be created for users who are likely to be interested in page C.

Association modeling analyzes data that consists of transactions.

8.16.5.1 Transactions

In transactional data, a collection of items is associated with each case. A case consists of a transaction such as a market-basket or web session.

The collection of items in the transaction is an attribute of the transaction. Other attributes might be the date, time, location, or user ID associated with the transaction. However, in most cases, only a tiny subset of all possible items are present in a given transaction. The items in the market-basket represent only a small fraction of the items available for sale in the store. Association is transaction-based.

When an item is not present in a collection, it may have a null value or it may be missing. Many of the items may be missing or null, because many of the items that could be in the collection are probably not present in any individual transaction.

8.16.6 Feature Extraction and Selection

The Feature Extraction machine learning function combines attributes into a new reduced set of features. The Feature Selection machine learning function selects the most relevant attributes.

Sometimes too much information can reduce the effectiveness of machine learning. Some columns of data attributes assembled for building and testing a model may not contribute meaningful information to the model. Some may actually detract from the quality and accuracy of the model.

Irrelevant attributes add noise to the data and affect model accuracy. Irrelevant attributes also increases the size of the model and the time and system resources needed for model building and scoring.

8.16.6.1 Feature Selection

Feature Selection ranks the existing attributes according to their predictive significance

Finding the most significant predictors is the goal of some machine learning projects. For example, a model might seek to find the principal characteristics of clients who pose a high credit risk.

Attribute Importance is also useful as a preprocessing step in classification modeling. Decision Tree and Generalized Linear Models benefit from this type of preprocessing. Oracle Machine Learning implements Feature Selection for optimization within both of these algorithms

Oracle Data Miner provides the Attribute Importance setting in the Filter Columns node transformation to identify important features using the Oracle Machine Learning importance function.

8.16.6.2 Feature Extraction

Feature Extraction is an attribute reduction process.

Unlike Feature Selection, which ranks the existing attributes according to their predictive significance, Feature Extraction actually transforms the attributes. The transformed attributes, or features, are linear combinations of the original attributes.

The Feature Extraction process results in a much smaller and richer set of attributes. The maximum number of features may be user-specified or determined by the algorithm. By default, it is determined by the algorithm.

Oracle Machine Learning supports these algorithms for Feature Extraction: