8 Model Nodes
Model nodes specify the models to build and the models to add to the workflow.
The Models section in the Components pane contains the Models nodes. The models in the Components pane are:
- Types of Models
Lists the types of Model nodes supported by Oracle Data Miner. - Automatic Data Preparation (ADP)
Automatic Data Preparation (ADP) transforms the build data according to the requirements of the algorithm, embeds the transformation instructions in the model, and uses the instructions to transform the test or scoring data when the model is applied. - Data Used for Model Building
Oracle Data Miner does not necessarily use all the columns in a Data Source when it builds models. - Model Nodes Properties
The Properties of the Model node allows you to examine and change characteristics of the node. - Anomaly Detection Node
An Anomaly Detection node builds one or more models that detect rare occurrences, such as fraud, using the One-Class SVM algorithm. - Association Node
The Association node defines one or more Association models. To specify data for the build, connect a Data Source node to the Association node. - Classification Node
The Classification node defines one or more classification models to build and to test. - Clustering Node
A Clustering node builds clustering models using the k-Means, O-Cluster, and Expectation Maximization algorithms. - Explicit Feature Extraction Node
The Explicit Feature Extraction node is built using the feature extraction algorithm called Explicit Semantic Analysis (ESA). - Feature Extraction Node
A Feature Extraction node uses the Nonnegative Matrix Factorization (NMF) algorithm, to build models. - Model Node
Model nodes rely on database resources for their definition. It may be necessary to refresh a node definition if the database resources change. - Model Details Node
The Model Detail node extracts and provides information about the model and algorithms. - R Build Node
The R Build Node allows you to register R models. It builds R models and generates R model test results for Classification and Regression mining function. R Build nodes supports Classification, Regression, Clustering, and Feature Extraction mining functions only. - Regression Node
The Regression node defines one or more Regression models to build and to test. - Advanced Settings Overview
The Advanced Settings dialog box enables you to edit data usage and other model specifications, add and remove models from the node. - Mining Functions
Mining functions represent a class of mining problems that can be solved using data mining algorithms.
Types of Models
Lists the types of Model nodes supported by Oracle Data Miner.
The types of models available are:
-
Anomaly Detection Node: Builds Anomaly Detection models using a one-class Support Vector Machine (SVM).
-
Association Node: Builds models for market basket analysis.
-
Classification Node: Builds and tests classification models with the same target, case ID, cost, and split settings, where relevant. The models use the classification algorithms: Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT), and Generalized Linear Model (GLM).
-
Clustering Node: Builds clustering models using the clustering algorithms: k-Means, O-Cluster, and Expectation Maximization (EM). EM requires Oracle Database 12c Release 1 (12.1) or later.
-
Explicit Feature Extraction Node: Builds feature extraction models using the Explicit Semantic Analysis algorithm.
-
Feature Extraction Node: Builds feature extraction models using the feature extraction algorithms: nonnegative matrix factorization, principal components analysis (PCA), and singular value decomposition (SVD). PCA and SVD require Oracle Database 12c Release 1 (12.1) or later.
-
Model Node: Adds models to a workflow that were not built in the current workflow. This node has no input data.
-
Model Details Node: Extracts model details from a model build node, a Model node, or any node that produces a model.
-
Regression Node: Builds and tests a collection of Regression models with the same target, case ID, cost, and split settings, where relevant. The models use the regression algorithms: SVM and GLM.
Parent topic: Model Nodes
Automatic Data Preparation (ADP)
Automatic Data Preparation (ADP) transforms the build data according to the requirements of the algorithm, embeds the transformation instructions in the model, and uses the instructions to transform the test or scoring data when the model is applied.
Data used for building a model must be properly prepared. Different algorithms have different input requirements. For example, Naive Bayes requires binned data.
If you are connected to Oracle Database 12c or later, then ADP prepares text data.
- Numerical Data Preparation
Automatic Data Preparation prepares numerical data for different algorithms in different ways. - Manual Data Preparation
For manual data preparation, you must understand the requirements of each algorithm and carry out the transformations in order to prepare the test data or scoring data.
Parent topic: Model Nodes
Numerical Data Preparation
Automatic Data Preparation prepares numerical data for different algorithms in different ways.
Here are some examples of how ADP prepares numerical data:
-
For algorithms that require binned data (such as Naive Bayes), ADP performs supervised binning. Supervised binning is a special binning approach that takes into account the target to find good cut-points in the predictor.
-
For algorithms that require normalized data (such as Support Vector Machines), the numerical data is normalized.
-
For algorithms that can handle untransformed data (such as Decision Tree), you can use the numerical data to find splitters in the tree with an approach similar to supervised binning.
Parent topic: Automatic Data Preparation (ADP)
Manual Data Preparation
For manual data preparation, you must understand the requirements of each algorithm and carry out the transformations in order to prepare the test data or scoring data.
You must perform manual binning for data which has business meaning, such as recoding a numeric column of ages to desired ranges like YOUTH, ADULT and so on. Otherwise, automatic data preparation is recommended.
Parent topic: Automatic Data Preparation (ADP)
Data Used for Model Building
Oracle Data Miner does not necessarily use all the columns in a Data Source when it builds models.
Model nodes use a set of heuristics to determine whether to exclude columns from the model building process or change the mining type from numerical to categorical only.
-
There are several reasons for not using a particular column for model building. If a column does not contain useful information, then it is usually not used.
The exact list of attributes used as input to build the model depends on the algorithm used to build the model. If an algorithm does not support a data type, then Oracle Data Miner does not use attributes with that data type as input.
For models that have targets, such as Classification models, the target cannot be text.
-
The same mining types are used for all models.
If you are connected to Oracle Database 12c Release 1 (12.1) and later, then specify the characteristics of Text attributes when you edit the Build node.
- Viewing and Changing Data Usage
You can view and change data usage in the Input tab of the Build Editor and in the Advanced Settings dialog box. - Text
Text is available for any of the following data types:CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
orNVARCHAR2.
Parent topic: Model Nodes
Viewing and Changing Data Usage
You can view and change data usage in the Input tab of the Build Editor and in the Advanced Settings dialog box.
- Input Tab of Build Editor
In the Input tab, the setting Determine inputs automatically (using heuristics) controls the automatic selection of attributes to be used as inputs, and the automatic selection of mining types. - Advanced Settings
In the Advanced Settings dialog box, you can edit settings related to model settings, data usage, performance settings, and algorithm settings.
Parent topic: Data Used for Model Building
Input Tab of Build Editor
In the Input tab, the setting Determine inputs automatically (using heuristics) controls the automatic selection of attributes to be used as inputs, and the automatic selection of mining types.
To edit a Build node:
- Double-click the node or right-click the node and select Edit.
- Click the Input tab. In the Input tab, the field Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner determines which attributes to use for input and characteristics of the attributes. Oracle Data Miner also determines the mining type, and specifies that auto data preparation is performed for all attributes. After the model is run, Oracle Data Miner generates rules describing the changes that it made, such as excluding an attribute or changing the mining type. To see detailed information about the heuristics, click Show.
Note:
You cannot view and edit data usage for an Association model using these steps.
- Automatic Input
When Automatic Input is selected, Oracle Data Miner does not use attributes that do not provide useful information. For example, attributes that are almost constant may not be suitable for input. - Manual Input
To specify inputs manually, deselect Determine Inputs Automatically (using heuristics).
Parent topic: Viewing and Changing Data Usage
Automatic Input
When Automatic Input is selected, Oracle Data Miner does not use attributes that do not provide useful information. For example, attributes that are almost constant may not be suitable for input.
After the node runs, rules describe the heuristics used. Click Show to see detailed informations.
Parent topic: Input Tab of Build Editor
Manual Input
To specify inputs manually, deselect Determine Inputs Automatically (using heuristics).
You can make the following changes by using the Manual Input option:
-
To ignore an attribute: If you do not want to use an attribute as input, then go to the Input column and click the output icon . Select the ignore icon and click OK. The attribute will not be used. It will be ignored. Similarly, to use an attribute that you have ignored, click in the Input column and select . The attribute is used in model build.
-
To change mining type of an attribute: Go to the Mining Type column and select an option from the drop-down list:
-
Numerical
-
Categorical
Text mining types are Text and Text Custom. Select Text Custom to create a column- level text specification.
-
-
To manually prepare data: By default, Automatic Data Preparation (ADP) is performed on all attributes. If you do not want Automatic Data Preparation performed for an attribute, then deselect the corresponding check box for that attribute in the Auto Prep column. If you turn off Auto Prep, then you are responsible for data preparation for that attribute.
Note:
If the mining type of an attribute is Text or Text Custom, then you cannot deselect Automatic Data Preparation.
Related Topics
Parent topic: Input Tab of Build Editor
Advanced Settings
In the Advanced Settings dialog box, you can edit settings related to model settings, data usage, performance settings, and algorithm settings.
To view which columns are selected by Oracle Data Miner and what mining type is assigned to each selected column, follow these steps:
Note:
You cannot view and edit data usage for an Association Model using these steps.
Note:
You can also turn Auto Data Prep off. This is not recommended. If you turn Auto Data Prep OFF
, then you must ensure that the input is properly prepared for each algorithm.
Related Topics
Parent topic: Viewing and Changing Data Usage
Text
Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
or NVARCHAR2.
If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.
If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.
Note:
If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.
To examine or specify text characteristics for data mining, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.
The Text tab enables you to modify the following:
-
Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is
200.
-
Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:
-
Token (Default): For Token as the transform type, the Default Settings are:
-
Languages: Specifies the languages used in the documents. The default is
English.
To change this value, select an option from the drop-down list. You can select more than one language. -
Bigram: Select this option to mix the
NORMAL
token type with their bigram. For example, New York. The token type isBIGRAM.
-
Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.
Note:
If both Bigram and Stemming are selected, then the token type is
STEM_BIGRAM.
If neither Bigram nor Stemming is selected, then token type isNORMAL.
-
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.-
Click to view the Stoplist Details. This opens the Stoplist Details dialog box.
-
Click to add a new stoplist. This opens the New Stoplist Wizard.
-
-
Tokens: Specify the following:
-
Max number of tokens across all rows (document). The default is
3000.
-
Min number of rows (document) required for a token
-
-
-
Theme: If Theme is selected, then the Default Settings are as follows:
-
Language: Specifies the languages used in the documents. The default is
English.
To change this value, select one from the drop-down list. You can select more than one language. -
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added. -
Themes: Specifies the maximum number of themes across all documents. The default is
3000.
-
-
Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.
-
-
Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.
You can use the same stoplist for all text columns.
Related Topics
Parent topic: Data Used for Model Building
Model Nodes Properties
The Properties of the Model node allows you to examine and change characteristics of the node.
You can view the properties of a Model Build node in any one of the following ways:
-
Select the node and go to View and click Properties. Click the Properties tab if necessary.
-
Right-click the node and select Go to Properties from the context menu.
In earlier releases, Properties was called Property Inspector. Properties of Model nodes have the following sections:
- Models
The Models section displays a list of the models defined in the node. By default, one model is built for each algorithm supported by the node. - Build
The Build section displays information related to the model build. For models that have a target, such as Classification and Regression, the targets are listed. All models in a node have the same target. - Test
The Test section is displayed for Classification and Regression models. They are the only models that can be tested. - Details
The Details section displays the node name and comments about the node.
Parent topic: Model Nodes
Models
The Models section displays a list of the models defined in the node. By default, one model is built for each algorithm supported by the node.
For each model, the name of the model, build information, the algorithm, and comments are listed in a grid. The Build column shows the time and date of the last successful build or if the model is not built or did not build successfully.
You can add, delete, or view models in the list. You can also indicate in which models are passed to subsequent nodes or not.
-
To delete a model from the list, select it and click .
-
To add a model, click . The Add Model dialog box opens.
-
To view a model that was built successfully, select the model and click .
You can tune classification models from Properties pane.
- Output Column
The Output Column in the Model Settings grid controls passing of models to subsequent nodes. - Add Model
In the Add Model dialog box, you can add a model to a node.
Related Topics
Parent topic: Model Nodes Properties
Output Column
The Output Column in the Model Settings grid controls passing of models to subsequent nodes.
The default setting is to pass all models to subsequent nodes.
-
To ignore a model, that is, to not pass it to subsequent nodes, click . The Output icon is replaced with the Ignore icon .
-
To cancel the ignore, click the Ignore icon again. It changes to the output icon.
Parent topic: Models
Add Model
In the Add Model dialog box, you can add a model to a node.
To add a model to a node:
- In the Algorithm field, select an algorithm from the drop-down list. For example, if you add a model to a clustering node, then the available algorithm are k-Means and O-Cluster. A default model name is displayed. You can change the default model.
- In the Comments field, add your comments, if any. This is an optional field.
- Click OK.
Parent topic: Models
Build
The Build section displays information related to the model build. For models that have a target, such as Classification and Regression, the targets are listed. All models in a node have the same target.
The Build section displays the following:
-
Target: Displays the target. To change the target, select a new target from the drop-down list.
-
Case ID: Displays the case ID of the model defined in this node. All the models in the node have the same case IDs. To edit the case IDs, select a different case ID from the drop-down list.
-
Transaction ID: Displayed for Association models only. To change the transaction ID, click Edit.
-
Item ID: Displayed for Association models only. To change the value, select an option from the drop-down list.
-
Item Value: Displayed for Association models only. To change the value, select an option from the drop-down list.
Parent topic: Model Nodes Properties
Test
The Test section is displayed for Classification and Regression models. They are the only models that can be tested.
The Test section defines how tests are done. By default, all models are tested. All models in the node are tested in the same way.
Related Topics
Parent topic: Model Nodes Properties
Details
The Details section displays the node name and comments about the node.
You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.
Related Topics
Parent topic: Model Nodes Properties
Anomaly Detection Node
An Anomaly Detection node builds one or more models that detect rare occurrences, such as fraud, using the One-Class SVM algorithm.
By default, an Anomaly Detection node builds one model using the one-class SVM algorithm. All models in the node have the same case ID.
There are two ways to detect anomalies:
-
Build and apply an Anomaly Detection model.
-
Use an Anomaly Detection Query, one of the Predictive Query nodes.
An Anomaly Detection build can run in parallel. The following topics describe Anomaly Detection Nodes:
- Create Anomaly Detection Node
An Anomaly Detection node builds one or more models that detect rare occurrences, such as fraud, and other anomalies using the One-Class SVM algorithm. - Edit Anomaly Detection Node
In the Edit Anomaly Detection Node dialog box, you can specify or change the characteristics of the models to build. - Data for Model Build
Oracle Data Miner uses heuristic techniques on data for model build. - Advanced Model Settings
The Advanced Settings dialog box lists all the models in the Model Settings section in the upper pane. You can add and delete models from the node. - Anomaly Detection Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Anomaly Detection Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Related Topics
Parent topic: Model Nodes
Create Anomaly Detection Node
An Anomaly Detection node builds one or more models that detect rare occurrences, such as fraud, and other anomalies using the One-Class SVM algorithm.
The input for a Model node is any node that generates data as an output, including Transform nodes and Data nodes.
Note:
If the data includes text columns, then prepare the text columns using a Build Text node. If you are connected to Oracle Database 12c or later, then use Automatic Data Preparation.
To create an Anomaly Detection node:
Related Topics
Parent topic: Anomaly Detection Node
Edit Anomaly Detection Node
In the Edit Anomaly Detection Node dialog box, you can specify or change the characteristics of the models to build.
To open the Edit Anomaly Detection Node dialog box, either double-click an Anomaly Detection node, or right-click an Anomaly Detection node and click Edit.
The Edit Anomaly Detection Node dialog box has the following tabs:
- Build (AD)
The Build tab for Anomaly Detection lists the models to be built and the Case ID. - Partition
In the Partition tab, you can build partitioned models. - Input
The Input tab specifies the input for model build. - Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size. - Text
Text is available for any of the following data types:CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
orNVARCHAR2.
Related Topics
Parent topic: Anomaly Detection Node
Build (AD)
The Build tab for Anomaly Detection lists the models to be built and the Case ID.
Specify the following:
Related Topics
Parent topic: Edit Anomaly Detection Node
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
Parent topic: Edit Anomaly Detection Node
Input
The Input tab specifies the input for model build.
Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.
Note:
For R Build nodes, Auto Data Preparation is not performed.After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.
You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).
Related Topics
Parent topic: Edit Anomaly Detection Node
Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.
By default, Sampling is set to OFF.
To set it to ON:
Parent topic: Edit Anomaly Detection Node
Text
Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
or NVARCHAR2.
If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.
If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.
Note:
If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.
To examine or specify text characteristics for data mining, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.
The Text tab enables you to modify the following:
-
Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is
200.
-
Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:
-
Token (Default): For Token as the transform type, the Default Settings are:
-
Languages: Specifies the languages used in the documents. The default is
English.
To change this value, select an option from the drop-down list. You can select more than one language. -
Bigram: Select this option to mix the
NORMAL
token type with their bigram. For example, New York. The token type isBIGRAM.
-
Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.
Note:
If both Bigram and Stemming are selected, then the token type is
STEM_BIGRAM.
If neither Bigram nor Stemming is selected, then token type isNORMAL.
-
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.-
Click to view the Stoplist Details. This opens the Stoplist Details dialog box.
-
Click to add a new stoplist. This opens the New Stoplist Wizard.
-
-
Tokens: Specify the following:
-
Max number of tokens across all rows (document). The default is
3000.
-
Min number of rows (document) required for a token
-
-
-
Theme: If Theme is selected, then the Default Settings are as follows:
-
Language: Specifies the languages used in the documents. The default is
English.
To change this value, select one from the drop-down list. You can select more than one language. -
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added. -
Themes: Specifies the maximum number of themes across all documents. The default is
3000.
-
-
Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.
-
-
Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.
You can use the same stoplist for all text columns.
Related Topics
Parent topic: Edit Anomaly Detection Node
Data for Model Build
Oracle Data Miner uses heuristic techniques on data for model build.
Oracle Data Miner uses heuristics to:
-
Determine the attributes of the input data used for model build.
-
Determine the mining type of each attribute.
Related Topics
Parent topic: Anomaly Detection Node
Advanced Model Settings
The Advanced Settings dialog box lists all the models in the Model Settings section in the upper pane. You can add and delete models from the node.
To change or view advanced settings, right-click the node and select Advanced Settings.
-
To delete a model, select it and click .
-
To add a model, click . The Add Model dialog box opens.
-
To modify data usage of a model, select the model in the upper pane. Make the necessary modifications in the Data Usage tab.
-
To modify the default algorithm, select the model in the upper pane. Make the necessary changes in the Algorithm Settings tab.
- Add Model (AD)
In the Add Model dialog box, you can add or change a model for the node. - Data Usage
The Data Usage tab contains the data grid that lists all attributes in the data source.
Related Topics
Parent topic: Anomaly Detection Node
Add Model (AD)
In the Add Model dialog box, you can add or change a model for the node.
The algorithm is already selected for you. To add a model:
- In the Algorithm field, the selected algorithm is displayed. You can change this and select a different algorithm from the drop-down list.
- In the Name field, enter a name for the model.
- In the Comments field, add your comments, if any. This is an optional field.
- Click OK.
Parent topic: Advanced Model Settings
Data Usage
The Data Usage tab contains the data grid that lists all attributes in the data source.
The Data Usage tab is not supported for the Association node. To modify any values, to see which attributes are not used as input, or to see mining types, select View in the lower pane.
You can change data usage information for several models at the same time. For each attribute, the grid lists displays the following:
-
Attributes: This is the name of the attribute.
-
Data Type: This is the Oracle Database data type of the attribute.
-
Input: Indicates if the attribute is used to build the model. To change the input type, click Automatic. Then click the icon and select the new icon. For models that have a target, such as Classification and Regression models, the target is marked with a red target icon.
-
The icon indicates that the attribute is used to build the model.
-
The icon indicates that the attribute is ignored, that is, it is not used to build the model.
-
-
Mining Type: This is the logical type of the attribute, either Numerical (numeric data), Categorical (character data), nested numerical, or nested categorical, text or custom text. If the attribute has a type that is not supported for mining, then the column is blank. Mining type is indicated by an icon. Move the cursor over the icon to see what the icon represents. To change the mining type, click Automatic and then click the type for the attribute. Select a new type from the list. You can change mining types as follows:
-
Numerical can be changed to Categorical. Changing to Categorical casts the numerical value to string.
-
Categorical.
-
Nested Categorical and Nested Numerical cannot be changed.
-
-
Auto Prep: If Auto Prep is selected, then automatic data preparation is performed on the attribute. If Auto Prep is not selected, then no automatic data preparation is performed for the attribute. In this case, you are required to perform any data preparation, such as normalization, that may be required by the algorithm used to build the model. No data preparation is done (or required) for target attributes. The default is to perform automatic data preparation.
-
Rules: After a model runs, Rules describe the heuristics used. For details, click Show.
There are two types of reasons for not selecting an attribute as input:
-
The attribute has a data type that is not supported by the algorithm used for model build.
For example, O-Cluster does not support nested data types such as
DM_NESTED_NUMERICALS
. If you use an attribute with typeDM_NESTED_NUMERICALS
to build a O-Cluster model, then the build fails. -
The attribute does not provide data useful for mining. For example, an attribute that has constant or nearly constant values.
If you include attributes of this kind, then the model has lower quality than if you exclude them.
Related Topics
Parent topic: Advanced Model Settings
Anomaly Detection Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.To view the properties of an Anomaly Detection node:
-
Right-click the node and select Go to Properties from the context menu.
-
If the Properties pane is closed, then go to View and click Properties.
Anomaly Detection Properties pane has the following sections:
- Models (AD)
The Models section displays a list of the models defined in the node. The default is to build one model. - Build (AD)
The Build section displays the case ID for the models defined in this node. - Partition
In the Partition tab, you can build partitioned models. - Details
The Details section displays the node name and comments about the node.
Parent topic: Anomaly Detection Node
Models (AD)
The Models section displays a list of the models defined in the node. The default is to build one model.
For each model, the name of the model, the build information, the algorithm, and comments are listed in a grid. The Build column shows the time and date of the last successful build or if the model is not built or did not build successfully.
You can add, delete, or view models in the list. You can also indicate in the which models are passed to subsequent nodes or not.
-
To delete a model, select it and click .
-
To add a model, click . The Add Model model dialog box opens.
-
To view a model, click . The appropriate model viewer opens.
-
To duplicate a model, select a model to duplicate and click .
- Output Column (AD)
The Output Column in the Model Settings grid controls passing of the models to subsequent nodes. - Add Model (AD)
In the Add Model dialog box, you can add or change a model for the node.
Parent topic: Anomaly Detection Node Properties
Output Column (AD)
The Output Column in the Model Settings grid controls passing of the models to subsequent nodes.
The default is to pass all models to subsequent nodes.
-
To ignore a model, click . The Output icon is replaced with the Ignore icon.
-
To cancel the ignore, click the Ignore icon again. The icon changes to the Output icon.
Parent topic: Models (AD)
Add Model (AD)
In the Add Model dialog box, you can add or change a model for the node.
The algorithm is already selected for you. To add a model:
- In the Algorithm field, the selected algorithm is displayed. You can change this and select a different algorithm from the drop-down list.
- In the Name field, enter a name for the model.
- In the Comments field, add your comments, if any. This is an optional field.
- Click OK.
Parent topic: Models (AD)
Build (AD)
The Build section displays the case ID for the models defined in this node.
All the models in the node have the same case ID. To change the case ID, select a different attribute from the list.
Parent topic: Anomaly Detection Node Properties
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
Parent topic: Anomaly Detection Node Properties
Details
The Details section displays the node name and comments about the node.
You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.
Related Topics
Parent topic: Anomaly Detection Node Properties
Anomaly Detection Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
To view the context menu options, right click the Anomaly Detection node. The following options are available in the context menu:
-
Edit. Opens the Edit Anomaly Detection Node dialog box.
-
Advanced Settings. Opens the Advanced Model Settings.
-
View Models. Opens the Anomaly Detection Model Viewer for the selected model.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed only if there is an error.
-
Show Validation Errors. Displayed only if there are validation errors.
Related Topics
Parent topic: Anomaly Detection Node
Association Node
The Association node defines one or more Association models. To specify data for the build, connect a Data Source node to the Association node.
All models in an Association node have the same input data.
Note:
The data for an Association model must be in transactional format.
Association models could generate a very large number of rules with low confidence and support, or they could generate no rules at all.
An Association build can run in parallel.
This section contains the following topics:
- Behavior of the Association Node
By default, an Association node builds one model using the Apriori algorithm. - Create Association Node
The data used to build an Association model must be in transactional format. - Edit Association Build Node
The Association Build Node editor enables you to specify or change the characteristics of the models to build. - Advanced Settings for Association Node
The Advanced Settings dialog box enables you to add or delete models, and modify the default algorithm settings for each model. - Association Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node. - Association Build Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Parent topic: Model Nodes
Behavior of the Association Node
By default, an Association node builds one model using the Apriori algorithm.
The Apriori algorithm assumes the following:
-
The data is transactional.
-
The data has many missing values. The apriori algorithm interprets all missing values as sparse data, and it has its own mechanisms for handling sparse data.
All models in the node have the same case ID, item ID, and item value. The case ID can be two columns. For example, the data sources SH.SALES
, CUST_ID
and TIME_ID
combined can be the case ID.
No automatic data preparation is done for an Association node. If you select a value for Item Value that is different from the default Existence,
then you might have to prepare the data.
Related Topics
Parent topic: Association Node
Create Association Node
The data used to build an Association model must be in transactional format.
To create an Association node:
Related Topics
Parent topic: Association Node
Edit Association Build Node
The Association Build Node editor enables you to specify or change the characteristics of the models to build.
To open the Edit Association Build Node dialog box, either double-click an Association node, or right-click an Association node and select Edit. The Edit Association Build Node dialog box comprises the following:
- Build
In the Build tab, you can provide the details required for a model build. - Partition
In the Partition tab, you can build partitioned models. - Filter
In the Filter tab, you can add items to filter. The items are sourced from the Data Source node, and not from the model. - Aggregates
In the Aggregates dialog box, you can add items to be used for aggregation. - Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.
Parent topic: Association Node
Build
In the Build tab, you can provide the details required for a model build.
Specify these settings in the Build tab:
-
Transaction IDs: These are a combination of attributes that uniquely identifies a transaction. To specify a transaction ID, click . The Select Columns dialog box opens. Move one or more attributes from the Available Attributes list to the Selected Attributes list. Click OK.
-
Item ID: Identifies an item. Select an attribute from the list.
-
Item Value:
Existence
(default). You can select an attribute from the drop-down list. This is an optional field.The item value column may specify information such as the number of items (for example, three apples) or the type of the item (for example, Macintosh Apples).
If you select an attribute from the list, then the attribute must have less than 10 distinct values. The default value for the maximum distinct count is 10. You can change the value in Model Build Preferences for Association.
Note:
If you specify an attribute for Item Value, then you might have to prepare the data.
You can perform the following tasks:
-
Add a model: Click . The Add Model dialog box opens.
-
Delete a model: Select the model and click .
-
Edit a model: Select the model and click . The Advanced Settings for Association Node dialog box opens. Here, you can specify Model settings or Algorithm settings.
-
Copy an existing model: Select the model and click .
At this point, you can click OK to finish the model definition.
- Select Columns (AR)
In the Select Column dialog box, you can add or remove attributes to be included in or excluded from model build.
Related Topics
Parent topic: Edit Association Build Node
Select Columns (AR)
In the Select Column dialog box, you can add or remove attributes to be included in or excluded from model build.
To select attributes:
- Select one or more attributes in the Available Attributes list.
- Use the arrows between the lists to move the selections to the Selected Attributes list.
- Click OK.
Parent topic: Build
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
- Advanced Settings
In the Advanced Settings dialog box, you can select and set the type of partition build.
Parent topic: Edit Association Build Node
Advanced Settings
In the Advanced Settings dialog box, you can select and set the type of partition build.
Parent topic: Partition
Filter
In the Filter tab, you can add items to filter. The items are sourced from the Data Source node, and not from the model.
- Find Items
In the Find Items dialog box, you can search and add items to be included in the filter rule or excluded from the filter rule.
Parent topic: Edit Association Build Node
Find Items
In the Find Items dialog box, you can search and add items to be included in the filter rule or excluded from the filter rule.
Parent topic: Filter
Aggregates
In the Aggregates dialog box, you can add items to be used for aggregation.
To include items for aggregation or exclude from aggregation:
Parent topic: Edit Association Build Node
Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.
By default, Sampling is set to OFF.
To set it to ON:
Parent topic: Edit Association Build Node
Advanced Settings for Association Node
The Advanced Settings dialog box enables you to add or delete models, and modify the default algorithm settings for each model.
Note:
It is possible for an Association model to generate a very large number of rules or no rules at all.Parent topic: Association Node
Association Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
To view the context menu options, right click the node. The following options are available in the Association node context menu:
-
Edit. Opens the Edit Association Build Node dialog box.
-
Advanced Settings. Opens the Algorithm Settings dialog box.
-
View Models. Opens the AR Model Viewer for the selected model.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed only if there is an error.
-
Show Validation Errors. Displayed only if there are validation errors.
Related Topics
Parent topic: Association Node
Association Build Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
The Association Build node Properties pane has the following sections:
- Models (AR)
The Models section displays a list of the models defined in the node. The default is to build one model. - Build (AR)
The Build section displays the transaction ID, item ID and item value of the models defined in the node. - Partition
In the Partition tab, you can build partitioned models. - Filter
In the Filter tab, you can add items to filter. The items are sourced from the Data Source node, and not from the model. - Aggregates
In the Aggregates dialog box, you can add items to be used for aggregation. - Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size. - Details
The Details section displays the node name and comments about the node.
Parent topic: Association Node
Models (AR)
The Models section displays a list of the models defined in the node. The default is to build one model.
For each model, the name of the model, build information, the algorithm, and comments are listed in a grid. The Build column shows the time and date of the last successful build or if the model is not built or did not build successfully.
You can add, delete, or view models in the list. You can also indicate which models are passed to subsequent nodes or not.
-
To delete a model from the list, select it and click .
-
To add a model, click . The Add Model dialog box opens.
-
To view a model that is built successfully, click . The appropriate model view opens.
-
To make a copy of a model, select the model and click .
Parent topic: Association Build Properties
Add Model (AR)
The algorithm is already selected for you. To add a model to the list:
- Accept or change the model name.
- In the Comments field, add comments, if any. This is optional.
- Click OK. This adds the new model to the list. The new model has the same build characteristics as existing models. It also has the default values for advanced settings.
Parent topic: Models (AR)
Output Column (AR)
The Output column in the Model Settings grid controls the passing of models to subsequent nodes. By default, all models are passed to subsequent nodes. You can perform the following tasks:
-
To ignore a model, click . The icon changes to .
-
To cancel an ignored model, click the ignore icon again. The icon changes to the Output icon.
Parent topic: Models (AR)
Build (AR)
The Build section displays the transaction ID, item ID and item value of the models defined in the node.
All models in the node have the same transaction ID, item ID and item value. The information that is displayed are:
-
Transaction IDs: Click Edit to change the transaction ID.
-
Item ID: You can select a different item ID from the drop-down list.
-
Item Value: You can select a different item value from the drop-down list.
Parent topic: Association Build Properties
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
Parent topic: Association Build Properties
Filter
In the Filter tab, you can add items to filter. The items are sourced from the Data Source node, and not from the model.
Parent topic: Association Build Properties
Aggregates
In the Aggregates dialog box, you can add items to be used for aggregation.
To include items for aggregation or exclude from aggregation:
Parent topic: Association Build Properties
Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.
By default, Sampling is set to OFF.
To set it to ON:
Parent topic: Association Build Properties
Details
The Details section displays the node name and comments about the node.
You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.
Related Topics
Parent topic: Association Build Properties
Classification Node
The Classification node defines one or more classification models to build and to test.
To specify data for the build, connect a Data Source node to the Classification node. The models in a Classification node all have the same target and case ID. You can only specify one target. A Classification build can run in parallel.
There are two ways to make classification predictions:
-
By building and testing a classification model. This can be done by using a classification node, and then applying the model to the new data to make classifications.
-
By using a prediction query, which is one of the predictive queries.
The section contains the following topics:
- Default Behavior for Classification Node
The default behavior of Classification node is based on certain algorithms, testing and tuning of models, Case ID and so on. - Create a Classification Node
The Classification node defines one or more classification models to build and to test. - Data for Model Build
Oracle Data Miner uses heuristic techniques on data for model build. - Edit Classification Build Node
In the Edit Classification Build Node dialog box, you can specify or change the characteristics of the models to build. - Advanced Settings for Classification Models
The Advanced Settings dialog box enables you to edit data usage and other model specifications, add and remove models from the node. - Classification Node Properties
The Classification node properties enables you to view and change information about model build and test. - Classification Build Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Parent topic: Model Nodes
Default Behavior for Classification Node
The default behavior of Classification node is based on certain algorithms, testing and tuning of models, Case ID and so on.
-
Algorithms used: For a binary target, the Classification node builds models using the following four algorithms:
If the target is not binary, then GLM is not built by default. You can explicitly add a GLM model to the node. The models must have the same build data and same target.
Note:
If do not want to create a particular model, then delete the model from the list of models. The blue check mark to the left side of the model name selects models to be used in subsequent nodes. It does not select models to build.
-
Testing of models: By default, the models are all tested. The test data is created by randomly splitting the build data into a build data set and a test data set. The default ratio for the split is 60:40. That is, 60 percent build and 40 percent test. Oracle Data Miner uses compression when it creates the build and test tables when appropriate.
-
Connecting nodes: You can connect both a build Data Source node and a test Data Source node to the Build node.
-
Testing models: You can test Classification models using a Test node along with separate test data.
-
Interpreting test results
-
Tuning models: After testing a classification, you can tune each model.
-
Case ID: The case ID is optional. However, if you do not specify a case ID, then the processing will be slower.
Related Topics
Parent topic: Classification Node
Create a Classification Node
The Classification node defines one or more classification models to build and to test.
Related Topics
Parent topic: Classification Node
Data for Model Build
Oracle Data Miner uses heuristic techniques on data for model build.
Oracle Data Miner uses heuristics to:
-
Determine the attributes of the input data used for model build.
-
Determine the mining type of each attribute.
Related Topics
Parent topic: Classification Node
Edit Classification Build Node
In the Edit Classification Build Node dialog box, you can specify or change the characteristics of the models to build.
To open the Edit Classification Build Node dialog box, either double-click a Classification Node, or right-click a Classification node and select Edit.
The Edit Classification Build Node dialog box has the following tabs:
- Build (Classification)
The Build node enables you to specify or change the characteristics of the models to build. - Partition
In the Partition tab, you can build partitioned models. - Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size. - Input
The Input tab specifies the input for model build. - Text
Text is available for any of the following data types:CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
orNVARCHAR2.
Related Topics
Parent topic: Classification Node
Build (Classification)
The Build node enables you to specify or change the characteristics of the models to build.
To edit the characteristics of the models to build, follow these steps:
By default, the model is tested using a test data set created by splitting the build data set. If you do not want to test the model in this way, then go to the Classification Test Node section in Classification node Properties pane. You can instead use a Test Node, and a test data source node to test the model.
- No Case ID
If a case ID is not supplied, then Oracle Data Miner creates a table for the all the input data that contains a generated case ID using the row number.
Parent topic: Edit Classification Build Node
No Case ID
If a case ID is not supplied, then Oracle Data Miner creates a table for the all the input data that contains a generated case ID using the row number.
This table is used as the source to create the build and test random sample views. The generated case ID is constant for all queries. This ensures that consistent test results are generated.
Parent topic: Build (Classification)
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
- Add Partition Column
In the Add Partition Column dialog box, you can add columns for partitioning. Partition columns are used to partition build models.
Parent topic: Edit Classification Build Node
Add Partition Column
In the Add Partition Column dialog box, you can add columns for partitioning. Partition columns are used to partition build models.
Select the columns that you want to partition in the Available Attributes list, and click the arrows to move them the Selected Attributes list. In the Available Attributes list, only the columns with the supported data types are displayed.
Parent topic: Partition
Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.
By default, Sampling is set to OFF.
To set it to ON:
Parent topic: Edit Classification Build Node
Input
The Input tab specifies the input for model build.
Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.
Note:
For R Build nodes, Auto Data Preparation is not performed.After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.
You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).
Related Topics
Parent topic: Edit Classification Build Node
Text
Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
or NVARCHAR2.
If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.
If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.
Note:
If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.
To examine or specify text characteristics for data mining, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.
The Text tab enables you to modify the following:
-
Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is
200.
-
Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:
-
Token (Default): For Token as the transform type, the Default Settings are:
-
Languages: Specifies the languages used in the documents. The default is
English.
To change this value, select an option from the drop-down list. You can select more than one language. -
Bigram: Select this option to mix the
NORMAL
token type with their bigram. For example, New York. The token type isBIGRAM.
-
Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.
Note:
If both Bigram and Stemming are selected, then the token type is
STEM_BIGRAM.
If neither Bigram nor Stemming is selected, then token type isNORMAL.
-
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.-
Click to view the Stoplist Details. This opens the Stoplist Details dialog box.
-
Click to add a new stoplist. This opens the New Stoplist Wizard.
-
-
Tokens: Specify the following:
-
Max number of tokens across all rows (document). The default is
3000.
-
Min number of rows (document) required for a token
-
-
-
Theme: If Theme is selected, then the Default Settings are as follows:
-
Language: Specifies the languages used in the documents. The default is
English.
To change this value, select one from the drop-down list. You can select more than one language. -
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added. -
Themes: Specifies the maximum number of themes across all documents. The default is
3000.
-
-
Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.
-
-
Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.
You can use the same stoplist for all text columns.
Related Topics
Parent topic: Edit Classification Build Node
Advanced Settings for Classification Models
The Advanced Settings dialog box enables you to edit data usage and other model specifications, add and remove models from the node.
The Advanced Settings dialog box comprises the following settings:
-
Data usage
-
Algorithm settings
-
Performance settings
To change or view advanced settings, click in the Edit Classification Build Node dialog box dialog box. Alternately, right-click the Classification Build node and click Advanced Settings.
The Advanced Settings dialog box lists all of the models in the node in the upper pane. You can add models and delete models in the upper pane of the dialog box.
In the lower pane, you can view or edit the following for the model selected in the upper pane:
-
The settings that can be changed depend on the algorithm:
Parent topic: Classification Node
Add Models
To add a model to the list, click . The Add Model dialog box opens.
- Add Model (Classification)
In the Add Model dialog box, you can add additional models.
Parent topic: Advanced Settings for Classification Models
Add Model (Classification)
In the Add Model dialog box, you can add additional models.
- In the Algorithm field, select an algorithm.
- In the Name field, a default name is displayed. You can use the default or rename the model.
- In the Comments field, you can enter comments, if any. This is an optional field.
- Click OK to add the model to the node.
Parent topic: Add Models
Classification Node Properties
The Classification node properties enables you to view and change information about model build and test.
Specify a target before building Classification models. You can specify a case ID. If you do not specify a case ID, then the processing will be slower.
If you are unable to view Properties, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
Classification node Properties pane has these sections:
- Classification Node Models
The Classification node lists the models that are built when the node runs. By default, the Classification Build node creates three classification models - Classification Node Build
The Build section displays the target and the case ID. The Build node must be connected to a Data Source node. - Classification Node Test
The Test section specifies the data used for test and which tests to perform. - Partition
In the Partition tab, you can build partitioned models. - Details
The Details section displays the node name and comments about the node.
Parent topic: Classification Node
Classification Node Models
The Classification node lists the models that are built when the node runs. By default, the Classification Build node creates three classification models
Each Classification Model uses a different classification algorithm:
-
Support Vector Machine (SVM)
-
Naive Bayes (NB)
-
Decision Tree (DT)
-
Generalized Linear Models (GLM). This algorithm is used as default, only if the target is binary. For multi-class targets, you can also specify the GLM algorithm if you add a model.
Model Setting lists the models that are built.
You can perform the following tasks:
-
Add: To add a model, click . The Add Model dialog box opens.
-
Delete: To delete the models, select it and click .
-
Compare Test Results: If models were tested, then you can compare test results by selecting two or more models and clicking .
-
View: If a model built successfully, then you can view the model by selecting the model and clicking . The Model viewer depends on the algorithm used to create the model.
-
Duplicate: To copy a model, select the model and click .
-
Tune Models: To tune models, select the model and click . This option is not available for partitioned models.
You can also indicate which models are passed to subsequent nodes or not.
Related Topics
Parent topic: Classification Node Properties
Classification Node Output Column
The Output column in the Model Settings grid controls the passing of models to subsequent nodes. By default, all models are passed to subsequent nodes.
-
To ignore a model, that is, not to pass it to subsequent nodes, click . The icon changes to the Ignore icon
-
To cancel the ignore, click the Ignore icon again. It changes to the output icon.
Parent topic: Classification Node Models
Classification Node Build
The Build section displays the target and the case ID. The Build node must be connected to a Data Source node.
You can perform the following tasks:
-
Target: You can select a target from the Target drop-down list.
-
Case ID: To change or select a case ID, select one attribute from the case ID drop-down list. This attribute uniquely identifies a case. case ID is an optional field. If you do not select a case ID, then the processing will be slower.
Parent topic: Classification Node Properties
Classification Node Test
The Test section specifies the data used for test and which tests to perform.
You can set the following settings:
-
Perform Test: Select this option to test the Classification Node. The default setting is to test all models built using the test data that is created by randomly splitting the build data into two subsets. By default, the following tests are performed:
-
Performance Metrics
-
Performance Matrix
-
ROC Curve (Binary Class only)
-
Lift and Profit: Lift and profit for the top 5 target classes by frequency. Click Edit. The Target Values Selection dialog box opens.
-
Generate Selected Test Results for Tuning: If you plan to tune the models, then you must test the models in the Build node, not in a Test node.
Note:
This option is not available for partitioned models.
-
-
Test Data: Select any one of the following options, by which Test Data is created:
-
Use all Mining Build Data for Testing
-
Use Split Build Data for Testing
-
Split for Test (%)
-
Create Split as:
Table
(default)
-
-
Use a Test Data Source for Testing: Select this option to connect the Test Data Source to the Build node, after you connect the Build data.
-
Note:
Another way to test a model is to use a Test node.
- Target Values Selection
In the Target Values Selection dialog box, you can change the number of target values by changing the frequency count.
Related Topics
Parent topic: Classification Node Properties
Target Values Selection
In the Target Values Selection dialog box, you can change the number of target values by changing the frequency count.
The Target Values Selection dialog box displays the number of target values selected. The default option Automatic is to use the top five target class values by frequency. You can change the number of target values by changing the frequency count. You can also select the option Use Lowest Occurring.
-
Automatic: By default, use the top five target class values by frequency.
-
Frequency Count: You can change the number of target values by changing the values in this value.
-
Use Lowest Occurring
-
Use Highest Occurring
-
-
Custom: Select this option to specify specific target values. Then, move the values from Available Values to Selected Values.
Parent topic: Classification Node Test
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
Parent topic: Classification Node Properties
Details
The Details section displays the node name and comments about the node.
You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.
Related Topics
Parent topic: Classification Node Properties
Classification Build Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
To view the context menu options, right click the node. The following options are available in the context menu:
-
Edit. Opens the Edit Classification Node dialog box.
-
Advanced Settings. Opens the Advanced Settings for Classification Models dialog box.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed only if there is an error.
-
Show Validation Errors. Displayed only if there are validation errors.
Parent topic: Classification Node
View Test Results
Select a model and then view the test results for the model.
Related Topics
Parent topic: Classification Build Node Context Menu
Compare Test Results
You can compare all successfully built models in the node by comparing the text results.
Related Topics
Parent topic: Classification Build Node Context Menu
Clustering Node
A Clustering node builds clustering models using the k-Means, O-Cluster, and Expectation Maximization algorithms.
There are two ways to cluster data:
-
By building a Clustering model: Use a Classification node. Then apply the model to new data to create clusters.
-
By using a Clustering query, which is one of the predictive queries.
A Clustering build can run in parallel.
Note:
Expectation Maximization models require Oracle Database 12c Release 1 (12.1) or later.
This section contains the following topics:
- Default Behavior for Clustering Node
A Clustering node builds three models using three different algorithms. - Create Clustering Build Node
You create a Clustering node to build clustering models using the k-Means, O-CLuster, and Expectation Maximization algorithms. - Data for Model Build
Oracle Data Miner uses heuristic techniques on data for model build. - Edit Clustering Build Node
The Edit Clustering Build Node dialog box you can specify or change the characteristics of the models to build. - Advanced Settings for Clustering Models
In the Advanced Settings dialog box, you can review and change settings related to data usage and algorithms used in the model. - Clustering Build Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Clustering Build Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Parent topic: Model Nodes
Default Behavior for Clustering Node
A Clustering node builds three models using three different algorithms.
The algorithms used by the Clustering node are:
-
k-Means algorithm (KM)
-
Orthogonal Partitioning Clustering (OC)
-
Expectation Maximization (EM). For EM, Oracle Database 12c Release 1 (12.1) or later is required.
A case ID is optional.
The models all have the same build data.
Note:
If do not want to create a model, then delete the model from the list of models. The blue check mark to the left of the model name selects models to be used in subsequent nodes, such as Apply. It does not select models to build.
Parent topic: Clustering Node
Create Clustering Build Node
You create a Clustering node to build clustering models using the k-Means, O-CLuster, and Expectation Maximization algorithms.
Parent topic: Clustering Node
Data for Model Build
Oracle Data Miner uses heuristic techniques on data for model build.
Oracle Data Miner uses heuristics to:
-
Determine the attributes of the input data used for model build.
-
Determine the mining type of each attribute.
Related Topics
Parent topic: Clustering Node
Edit Clustering Build Node
The Edit Clustering Build Node dialog box you can specify or change the characteristics of the models to build.
To open the Edit Clustering Build Node dialog box, double-click a Clustering node. Alternately, you can right-click a Clustering node and select Edit.
The Edit Clustering Build Node dialog box has three tabs:
- Build (Clustering)
The Build tab enables you to specify or change the characteristics of the models to build. - Partition
In the Partition tab, you can build partitioned models. - Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size. - Input
The Input tab specifies the input for model build. - Text
Text is available for any of the following data types:CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
orNVARCHAR2.
Related Topics
Parent topic: Clustering Node
Build (Clustering)
The Build tab enables you to specify or change the characteristics of the models to build.
To edit the characteristics of the models to build:
- Add Model (Clustering)
In the Add Model dialog box, you can add models to the Clustering node.
Related Topics
Parent topic: Edit Clustering Build Node
Add Model (Clustering)
In the Add Model dialog box, you can add models to the Clustering node.
In the Add Model dialog box:
Parent topic: Build (Clustering)
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
Parent topic: Edit Clustering Build Node
Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.
By default, Sampling is set to OFF.
To set it to ON:
Parent topic: Edit Clustering Build Node
Input
The Input tab specifies the input for model build.
Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.
Note:
For R Build nodes, Auto Data Preparation is not performed.After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.
You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).
Related Topics
Parent topic: Edit Clustering Build Node
Text
Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
or NVARCHAR2.
If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.
If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.
Note:
If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.
To examine or specify text characteristics for data mining, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.
The Text tab enables you to modify the following:
-
Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is
200.
-
Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:
-
Token (Default): For Token as the transform type, the Default Settings are:
-
Languages: Specifies the languages used in the documents. The default is
English.
To change this value, select an option from the drop-down list. You can select more than one language. -
Bigram: Select this option to mix the
NORMAL
token type with their bigram. For example, New York. The token type isBIGRAM.
-
Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.
Note:
If both Bigram and Stemming are selected, then the token type is
STEM_BIGRAM.
If neither Bigram nor Stemming is selected, then token type isNORMAL.
-
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.-
Click to view the Stoplist Details. This opens the Stoplist Details dialog box.
-
Click to add a new stoplist. This opens the New Stoplist Wizard.
-
-
Tokens: Specify the following:
-
Max number of tokens across all rows (document). The default is
3000.
-
Min number of rows (document) required for a token
-
-
-
Theme: If Theme is selected, then the Default Settings are as follows:
-
Language: Specifies the languages used in the documents. The default is
English.
To change this value, select one from the drop-down list. You can select more than one language. -
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added. -
Themes: Specifies the maximum number of themes across all documents. The default is
3000.
-
-
Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.
-
-
Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.
You can use the same stoplist for all text columns.
Related Topics
Parent topic: Edit Clustering Build Node
Advanced Settings for Clustering Models
In the Advanced Settings dialog box, you can review and change settings related to data usage and algorithms used in the model.
To access advanced settings, click in the Edit Clustering Build Node dialog box. Alternately, right-click the node and select Advanced Settings. The Advanced Settings dialog box list all the models in the upper pane.
You can perform the following tasks:
-
Inspect and change the data usage and algorithm
-
Add models to the node
-
Delete models from the node
In the lower pane, you can view and modify data usage and algorithm settings for the model selected in the upper pane. You can edit the following:
The settings that can be changed depend on the algorithms.
Related Topics
Parent topic: Clustering Node
Clustering Build Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
The Clustering Build node properties has these sections:
- Models
The Models section in Properties lists the models that are built when the nodes are run. - Build
The Build section in Properties displays the Case ID of the Clustering model. - Partition
In the Partition tab, you can build partitioned models. - Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size. - Details
The Details section displays the name of the node and any comments about it.
Parent topic: Clustering Node
Models
The Models section in Properties lists the models that are built when the nodes are run.
The default is to build two Clustering models using the KM, OC, and EM algorithms.
The Model Settings grid lists the models in the node. You can perform the following tasks:
-
Search models
-
Add models
-
Delete models
-
Duplicate models
-
View models
-
Indicate which models are passed on to subsequent nodes.
- Add Model (Clustering)
In the Add Model dialog box, you can add models to the Clustering node. - View Models
Use the View Models option to view the details of the models that are built after running the workflow. - Clustering Node Output Column
The Output column in the Model Settings grid controls the passing of models to subsequent nodes.
Parent topic: Clustering Build Node Properties
Add Model (Clustering)
In the Add Model dialog box, you can add models to the Clustering node.
In the Add Model dialog box:
Parent topic: Models
View Models
Use the View Models option to view the details of the models that are built after running the workflow.
To view models, you must select a model from the list to open the model viewer. A model must be built successfully before it can be viewed.
Parent topic: Models
Clustering Node Output Column
The Output column in the Model Settings grid controls the passing of models to subsequent nodes.
By default, all models are passed to subsequent nodes.
-
To ignore a model, that is, to not pass it to subsequent nodes, click . The Output icon changes to .
-
To cancel the ignore, click the Ignore icon again. The icon changes to the Output icon.
Parent topic: Models
Build
The Build section in Properties displays the Case ID of the Clustering model.
To change the case ID, select an attribute from the Case ID drop-down list.
Parent topic: Clustering Build Node Properties
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
Parent topic: Clustering Build Node Properties
Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.
By default, Sampling is set to OFF.
To set it to ON:
Parent topic: Clustering Build Node Properties
Details
The Details section displays the name of the node and any comments about it.
You can change the name and comments in the fields here:
-
Node Name
-
Node Comments
Parent topic: Clustering Build Node Properties
Clustering Build Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
To view the context menu options, right click the node. The following options are available in the context menu:
-
Edit. Opens the Edit Association Build Node dialog box.
-
Advanced Settings. Opens the Advanced Settings for Association Node dialog box.
-
View Models. Opens the appropriate viewer (KM Model Viewer or OC Model Viewer) for the selected model.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Event Log. Displayed only if the running of the node fails.
Parent topic: Clustering Node
Explicit Feature Extraction Node
The Explicit Feature Extraction node is built using the feature extraction algorithm called Explicit Semantic Analysis (ESA).
ESA is a vectorial representation of text, which can be individual words or entire documents. The algorithm uses a document corpus as the knowledge base. In ESA, a word is represented as a column vector in the tf–idf matrix of the text corpus and a document is represented as the centroid of the vectors representing its words. Oracle Data Mining provides a prebuilt ESA model based on Wikipedia. You can import the model to Oracle Data Miner for mining purposes.
You can use the Explicit Feature Extraction node for the following purposes:
-
Document classification
-
Calculations related to semantics
-
Information retrieval
- Create Explicit Feature Extraction Node
You create an Explicit Feature Extraction node for the purposes related to information retrieval, documentation classification, and for all other calculations related to semantics. - Edit Explicit Feature Extraction Node
When you create an Explicit Feature Extraction node, an ESA model with the default algorithm settings is added. You can add additional ESA models and edit them in the Edit Explicit Feature Extraction Node dialog box. - Advanced Model Settings
In the Advanced Model Settings dialog box, you can edit and set algorithm settings of the selected Explicit Semantic Analysis model. - Explicit Feature Extraction Build Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Explicit Feature Extraction Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Related Topics
Parent topic: Model Nodes
Create Explicit Feature Extraction Node
You create an Explicit Feature Extraction node for the purposes related to information retrieval, documentation classification, and for all other calculations related to semantics.
Related Topics
Parent topic: Explicit Feature Extraction Node
Edit Explicit Feature Extraction Node
When you create an Explicit Feature Extraction node, an ESA model with the default algorithm settings is added. You can add additional ESA models and edit them in the Edit Explicit Feature Extraction Node dialog box.
The Edit Explicit Feature Extraction Node dialog box comprises the following tabs:
- Build
The Build tab enables you to specify or change the characteristics of the models to build. - Partition
In the Partition tab, you can build partitioned models. - Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size. - Input
The Input tab specifies the input for model build. - Text
Text is available for any of the following data types:CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
orNVARCHAR2.
Related Topics
Parent topic: Explicit Feature Extraction Node
Build
The Build tab enables you to specify or change the characteristics of the models to build.
To edit the characteristics of the model to build, follow these steps:
- Add Model
The Add Model dialog box allows you to add additional ESA models to the Explicit Feature Extraction node.
Related Topics
Parent topic: Edit Explicit Feature Extraction Node
Add Model
The Add Model dialog box allows you to add additional ESA models to the Explicit Feature Extraction node.
To add a model:
- In the Algorithm field, the Explicit Semantic Algorithm is displayed.
- In the Name field, edit the name.
- In the Commentsfield enter comments, if any.
- Click OK.
Parent topic: Build
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
- Add Partitioning Columns
Partitioning columns result in building a virtual model for each unique partition. Because the virtual model uses data only from a specific partition, it can potentially predict cases more accurately than if you did not select a partition.
Parent topic: Edit Explicit Feature Extraction Node
Add Partitioning Columns
Partitioning columns result in building a virtual model for each unique partition. Because the virtual model uses data only from a specific partition, it can potentially predict cases more accurately than if you did not select a partition.
In addition to selecting attributes, you can specify partitioning expressions. Partitioning expressions are concatenated and the result expression is the same for all predictive functions.
- Select one or more attributes in the Available Attributes list to serve as partitions.
- Move the selected columns to the Selected Attributes list using the arrows.
- Click OK. The attributes are moved to the Partition list.
Optionally, you can add partitioning expressions.
Parent topic: Partition
Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.
By default, Sampling is set to OFF.
To set it to ON:
Parent topic: Edit Explicit Feature Extraction Node
Input
The Input tab specifies the input for model build.
Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.
Note:
For R Build nodes, Auto Data Preparation is not performed.After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.
You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).
Related Topics
Parent topic: Edit Explicit Feature Extraction Node
Text
Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
or NVARCHAR2.
If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.
If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.
Note:
If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.
To examine or specify text characteristics for data mining, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.
The Text tab enables you to modify the following:
-
Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is
200.
-
Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:
-
Token (Default): For Token as the transform type, the Default Settings are:
-
Languages: Specifies the languages used in the documents. The default is
English.
To change this value, select an option from the drop-down list. You can select more than one language. -
Bigram: Select this option to mix the
NORMAL
token type with their bigram. For example, New York. The token type isBIGRAM.
-
Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.
Note:
If both Bigram and Stemming are selected, then the token type is
STEM_BIGRAM.
If neither Bigram nor Stemming is selected, then token type isNORMAL.
-
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.-
Click to view the Stoplist Details. This opens the Stoplist Details dialog box.
-
Click to add a new stoplist. This opens the New Stoplist Wizard.
-
-
Tokens: Specify the following:
-
Max number of tokens across all rows (document). The default is
3000.
-
Min number of rows (document) required for a token
-
-
-
Theme: If Theme is selected, then the Default Settings are as follows:
-
Language: Specifies the languages used in the documents. The default is
English.
To change this value, select one from the drop-down list. You can select more than one language. -
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added. -
Themes: Specifies the maximum number of themes across all documents. The default is
3000.
-
-
Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.
-
-
Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.
You can use the same stoplist for all text columns.
Related Topics
Parent topic: Edit Explicit Feature Extraction Node
Advanced Model Settings
In the Advanced Model Settings dialog box, you can edit and set algorithm settings of the selected Explicit Semantic Analysis model.
The Explicit Semantic Algorithm (ESA) model has only three algorithm settings:
-
Data Usage: Displays the attribute name, data type, mining type and other details about the attributes in the selected model. You can customize your input source here.
-
Algorithm Settings: The following are the algorithm settings for an ESA model:
-
Top N Features: Controls the maximum number of features per attribute. It must be a positive integer. The default is
1000.
-
Minimum Items: Determines the minimum number of non-zero entries that need to be present in an input row.
-
Threshold Value: This setting thresholds very small values in the transformed build data. It must be a non-negative number. The default is
0.00000001.
-
Parent topic: Explicit Feature Extraction Node
Explicit Feature Extraction Build Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
The Explicit Feature Extraction Build node properties has these sections:
- Models
The Models section displays a list of the models defined in the node. By default, one model is built for each algorithm supported by the node. - Build
The Build section displays information related to the model build. For models that have a target, such as Classification and Regression, the targets are listed. All models in a node have the same target. - Partition
In the Partition tab, you can build partitioned models. - Details
The Details section displays the node name and comments about the node.
Parent topic: Explicit Feature Extraction Node
Models
The Models section displays a list of the models defined in the node. By default, one model is built for each algorithm supported by the node.
For each model, the name of the model, build information, the algorithm, and comments are listed in a grid. The Build column shows the time and date of the last successful build or if the model is not built or did not build successfully.
You can add, delete, or view models in the list. You can also indicate in which models are passed to subsequent nodes or not.
-
To delete a model from the list, select it and click .
-
To add a model, click . The Add Model dialog box opens.
-
To view a model that was built successfully, select the model and click .
You can tune classification models from Properties pane.
Related Topics
Parent topic: Explicit Feature Extraction Build Properties
Build
The Build section displays information related to the model build. For models that have a target, such as Classification and Regression, the targets are listed. All models in a node have the same target.
The Build section displays the following:
-
Target: Displays the target. To change the target, select a new target from the drop-down list.
-
Case ID: Displays the case ID of the model defined in this node. All the models in the node have the same case IDs. To edit the case IDs, select a different case ID from the drop-down list.
-
Transaction ID: Displayed for Association models only. To change the transaction ID, click Edit.
-
Item ID: Displayed for Association models only. To change the value, select an option from the drop-down list.
-
Item Value: Displayed for Association models only. To change the value, select an option from the drop-down list.
Parent topic: Explicit Feature Extraction Build Properties
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
Parent topic: Explicit Feature Extraction Build Properties
Details
The Details section displays the node name and comments about the node.
You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.
Related Topics
Parent topic: Explicit Feature Extraction Build Properties
Explicit Feature Extraction Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
To view the context menu options, right click the node. The following options are available in the context menu:
-
Edit. Opens the Edit Explicit Feature Extraction Node dialog box.
-
Advanced Settings. Opens the Advanced Model Settings dialog box.
-
View Models. Opens the ESA Model Viewer.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Copy Image to Clipboard
-
Save Image as. Opens the Publish Diagram dialog box.
Related Topics
Parent topic: Explicit Feature Extraction Node
Feature Extraction Node
A Feature Extraction node uses the Nonnegative Matrix Factorization (NMF) algorithm, to build models.
There are two ways to extract features:
-
Build a feature extraction model, using a Feature Extraction node.
-
Use a Feature Extraction Query, which is one of the predictive queries.
If Oracle Data Miner is connected to Oracle Database 12c Release 1 (12.1) and later, then the Feature Extraction node uses PCA and SVD algorithms to build models.
Note:
Principal Components Analysis and Singular Value Decomposition models require Oracle Database 12c Release 1 (12.1) and later.
A Feature Extraction Build can run in parallel.
This section contains the following topics:
- Default Behavior of Feature Extraction Node
By default, a Feature Extraction node builds one model using the Non-Negative Matrix Factorization (NMF) algorithm. - Create Feature Extraction Node
You create a Feature Extraction node to build feature extraction models. The node uses the Nonnegative Matrix Factorization (NMF) algorithm. - Data for Model Build
Oracle Data Miner uses heuristic techniques on data for model build. - Edit Feature Extraction Build Node
In the Edit Feature Extraction Build Node dialog box, you can specify or change the characteristics of the models to build. - Advanced Settings for Feature Extraction
The options in Advanced Settings for Feature Extraction allows you to inspect and change the data usage and algorithm settings for each model in the node. - Feature Extraction Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Feature Extraction Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Parent topic: Model Nodes
Default Behavior of Feature Extraction Node
By default, a Feature Extraction node builds one model using the Non-Negative Matrix Factorization (NMF) algorithm.
If you are connected to Oracle Database 12c or later, then the node builds two models by default:
-
NMF model
-
PCA model
You can add SVD models.
All models in the node use the same build data and have the same case ID, if you specify a case ID.
Related Topics
Parent topic: Feature Extraction Node
Create Feature Extraction Node
You create a Feature Extraction node to build feature extraction models. The node uses the Nonnegative Matrix Factorization (NMF) algorithm.
Related Topics
Parent topic: Feature Extraction Node
Data for Model Build
Oracle Data Miner uses heuristic techniques on data for model build.
Oracle Data Miner uses heuristics to:
-
Determine the attributes of the input data used for model build.
-
Determine the mining type of each attribute.
Related Topics
Parent topic: Feature Extraction Node
Edit Feature Extraction Build Node
In the Edit Feature Extraction Build Node dialog box, you can specify or change the characteristics of the models to build.
To edit a Feature Build node, either double-click a Feature Build node, or right-click the node and select Edit. The Edit Feature Extraction Build Node dialog box opens. The same dialog box opens when you drop a Feature Build node on a workflow.
The Edit Feature Extraction Build dialog box has three tabs:
- Build (Feature Extraction)
In the Build tab, you can edit settings related to the Feature Extraction build node. - Partition
In the Partition tab, you can build partitioned models. - Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size. - Input
The Input tab specifies the input for model build. - Text
Text is available for any of the following data types:CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
orNVARCHAR2.
Related Topics
Parent topic: Feature Extraction Node
Build (Feature Extraction)
In the Build tab, you can edit settings related to the Feature Extraction build node.
You can perform the following tasks:
-
Case ID: Specify case ID for Feature Extraction is optional. Specify one by selecting an attribute from the drop-down list.
-
Add Model: To add a model, click
-
Delete: To delete a model, select the model and click .
-
Copy: To copy an existing model, select the model and click .
- Add Model (Feature Extraction)
In the Add Model dialog box, you can add additional models.
Parent topic: Edit Feature Extraction Build Node
Add Model (Feature Extraction)
In the Add Model dialog box, you can add additional models.
To add a model, click .
- In the Algorithm field, select an algorithm. The default algorithm is NMF.
- In the Name field, the default name is displayed. You can accept the default name or change it.
- In the Comments field, enter comments, if any. This is an optional field.
- Click OK. The model is added to the list. The new model has the same build characteristics as existing models. The new model has the default values for advanced settings.
Parent topic: Build (Feature Extraction)
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
Parent topic: Edit Feature Extraction Build Node
Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.
By default, Sampling is set to OFF.
To set it to ON:
Parent topic: Edit Feature Extraction Build Node
Input
The Input tab specifies the input for model build.
Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.
Note:
For R Build nodes, Auto Data Preparation is not performed.After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.
You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).
Related Topics
Parent topic: Edit Feature Extraction Build Node
Text
Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
or NVARCHAR2.
If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.
If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.
Note:
If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.
To examine or specify text characteristics for data mining, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.
The Text tab enables you to modify the following:
-
Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is
200.
-
Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:
-
Token (Default): For Token as the transform type, the Default Settings are:
-
Languages: Specifies the languages used in the documents. The default is
English.
To change this value, select an option from the drop-down list. You can select more than one language. -
Bigram: Select this option to mix the
NORMAL
token type with their bigram. For example, New York. The token type isBIGRAM.
-
Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.
Note:
If both Bigram and Stemming are selected, then the token type is
STEM_BIGRAM.
If neither Bigram nor Stemming is selected, then token type isNORMAL.
-
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.-
Click to view the Stoplist Details. This opens the Stoplist Details dialog box.
-
Click to add a new stoplist. This opens the New Stoplist Wizard.
-
-
Tokens: Specify the following:
-
Max number of tokens across all rows (document). The default is
3000.
-
Min number of rows (document) required for a token
-
-
-
Theme: If Theme is selected, then the Default Settings are as follows:
-
Language: Specifies the languages used in the documents. The default is
English.
To change this value, select one from the drop-down list. You can select more than one language. -
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added. -
Themes: Specifies the maximum number of themes across all documents. The default is
3000.
-
-
Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.
-
-
Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.
You can use the same stoplist for all text columns.
- Stoplist Details
The Stoplist Details dialog box lists the stopwords and stopthemes for the selected Stoplist. You can also add and delete stopwords and stopthemes. - Add Stopwords Stopthemes from Features
The Add Stopwords/Stopthemes from Features dialog box allows you to select stopwords or stopthemes from the generated features to be included as new stopwords or stopthemes for the selected stoplist.
Related Topics
Parent topic: Edit Feature Extraction Build Node
Stoplist Details
The Stoplist Details dialog box lists the stopwords and stopthemes for the selected Stoplist. You can also add and delete stopwords and stopthemes.
Parent topic: Text
Add Stopwords Stopthemes from Features
The Add Stopwords/Stopthemes from Features dialog box allows you to select stopwords or stopthemes from the generated features to be included as new stopwords or stopthemes for the selected stoplist.
- Select the stopwords or stopthemes that you want to add.
- Click OK.
Parent topic: Text
Advanced Settings for Feature Extraction
The options in Advanced Settings for Feature Extraction allows you to inspect and change the data usage and algorithm settings for each model in the node.
You can perform the following:
-
Inspect and change data usage.
-
Change algorithm settings for each model in the node.
To change or view advanced settings, click in the Edit Feature Extraction Build Node dialog box. Alternately, right-click the node and select Advanced Settings. The advanced settings selection enables you to inspect and change the data usage and algorithm settings for each model in the node.
In the upper pane, all models are listed. You can perform the following tasks:
-
Delete: To delete a model, select it and click .
-
Add: To add a model, click .
In the lower pane, you can view or edit the following for the model selected in the upper pane:
-
The settings depend on the algorithm:
PCA and SVD are available if Oracle Data Miner is connected to Oracle Database 12c Release 1 (12.1) or later.
Related Topics
Parent topic: Feature Extraction Node
Feature Extraction Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
-
Models: Displays the details of model settings. You can edit models here.
-
Build: Displays the case ID for the models defined in this node. All the models in the node have the same case ID. To edit the case ID, select a different attribute from the Case ID list.
-
Partition: Displays the details related to partitioned models. You can add and modify partitioned models here.
-
Details: Displays the details related to the Feature Extraction Build node.
Parent topic: Feature Extraction Node
Feature Extraction Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
To view the context menu options, right click the node. The following options are available in the context menu:
-
Edit: Opens the Edit Feature Extraction Build Node dialog box.
-
Advanced Settings: Opens the Advanced Settings for Feature Extraction dialog box.
-
View Models: Opens the NMF Model Viewer for the selected model.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed only if there is an error.
-
Show Validation Errors. Displayed only if there are validation errors.
Parent topic: Feature Extraction Node
Model Node
Model nodes rely on database resources for their definition. It may be necessary to refresh a node definition if the database resources change.
For example, if the resources are deleted or re-created. You can specify a model that was built using either of the ODM APIs. The models in a Model node must satisfy the model constraints.
The Model node takes no input. A Model node can be an input to any node that accepts models, such as the Apply and Test nodes, at least for some function types. For example, if a model node contains Classification or Regression models, then it can be input to a test node. Test data must be prepared in the same way that the build data was prepared.
- Create a Model Node
A Model node enables you to add models to a workflow that were not built in the workflow. - Edit Model Selection
In the Edit Model Section dialog box, you can select one or more models to include in the Model node or to remove models from the Model node. - Model Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Model Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Related Topics
Parent topic: Model Nodes
Create a Model Node
A Model node enables you to add models to a workflow that were not built in the workflow.
To add a model node to a workflow and add models to the model node:
Related Topics
Parent topic: Model Node
Edit Model Selection
In the Edit Model Section dialog box, you can select one or more models to include in the Model node or to remove models from the Model node.
To edit the models in the node, double-click the Model node or right-click the Model node and select Edit.
Note:
All the models in a model node must satisfy the model constraints.
You can perform the following tasks:
-
Select models from the Available Compatible Models list and move them to the Selected Models list using the controls between the lists. The selected models are checked for compatibility. The models in a model node must satisfy the model constraints. The selected models are part of the model node. You can view the models using the Model node properties.
-
Include models from other schemas. To include models, select Include Models from Other Schemas.
-
Filter the Available Compatible Models list in the following ways:
-
Select a model function from the Model Function list. The options are:
-
All
-
Anomaly Detection
-
Association Rules
-
Regression
-
Clustering
-
Feature Extraction
-
-
Sort the models by name, function, algorithm, target, target data type, creation date, or comments. To sort, click the column header in the list of available models.
-
-
Add and remove models:
-
Add models by moving them from Available Compatible Models list to the Selected Models list.
-
Remove models by moving them from the Selected Modes list to the Available Compatible Models list. You can also remove models using the Models tab.
-
Parent topic: Model Node
Model Constraints
A Model node consists of models that are similar. The models in a Model node must satisfy the following;
-
All models must have the same function type (Classification, Regression, Clustering, Anomaly Detection, Association Rules, or Feature Extraction). You cannot include models that have different function types.
You can add models that are built using different algorithms if the models have the same function type.
-
Classification or Regression models must have the same target attribute. The target attributes must all have the same data type.
CHAR
andVARCHAR2
are considered to be the same data type for Classification models. -
Classification models must have the same list of target values.
Parent topic: Edit Model Selection
Model Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
In the Model node Properties pane, you can:
-
Add models to the Model node
-
Delete models from the Model node
-
View models in the Model node
The Properties pane for a model node source node has the following sections:
- Models (Model Node)
The Models section shows the mining function that the models use and lists all the models included in the node in a grid. - Details
The Details section displays the node name and comments about the node.
Parent topic: Model Node
Models (Model Node)
The Models section shows the mining function that the models use and lists all the models included in the node in a grid.
You can search for models, add models to the node, and delete models. You can perform the following tasks:
-
Add Models: To add models:
-
Click . The Edit Model Selection dialog box opens.
-
In the Edit Model Selection dialog box, select the models to add to the node. You can add models from other schemas too. However, any models that you add must be compatible with the models already in the node.
-
Click OK. This adds the models to the node. You can go to the Properties pane for the Model node to view the models.
-
-
Delete Models: To delete a model, select it and click .
-
View Models: To view a model, select it and click .
-
Refresh models: To refresh models, click . If data on the server changes, then it may be necessary to refresh the node.
Related Topics
Parent topic: Model Node Properties
Details
The Details section displays the node name and comments about the node.
You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.
Related Topics
Parent topic: Model Node Properties
Model Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
The following options are available in the context menu:
-
Edit. Opens the Edit Model Selection dialog box.
-
Run. Validates that the models specified in the node exist.
-
Show Runtime Errors. Displayed only if there is an error.
-
Show Validation Errors. Displayed only if there are validation errors.
Parent topic: Model Node
Model Details Node
The Model Detail node extracts and provides information about the model and algorithms.
The Model Details nodes are the most useful for application developers. The Model Details node performs the following functions:
-
Extracts model details from a Model Build node, a Model node or any node that outputs a model.
-
Reveals information about model attributes and their treatment by the algorithm. The output depends on the type of models selected and the specific type of model details you specify.
-
The output of the Model Details node is a data flow. To enable the data to persist, use a Create Table or View node.
A Model Details node can run in parallel.
This section on Model Detail node contains the following topics:
- Model Details Node Input and Output
The input for a Model Details node is either a Build node (any model type) or a Model node. - Create Model Details Node
The Model Detail node extracts and provides information about the model and algorithms. - Edit Model Details Node
The Model Details Node editor enables you to view or specify the models details provided by the node. - Model Details Automatic Specification
The Automatic Specification setting determines how specifications change automatically. - Model Details Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Model Details Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node. - Model Details Per Model
The exact data displayed in a Model Details node depends on the particular models.
Parent topic: Model Nodes
Model Details Node Input and Output
The input for a Model Details node is either a Build node (any model type) or a Model node.
All models in Build nodes or Model nodes must have the same mining function type. For example, if one is a Classification model, then all of them must be Classification models.
The output for a Model Details node is a data flow based on the model detail specifications. To enable the data to persist, use a Create Table or View node.
Related Topics
Parent topic: Model Details Node
Create Model Details Node
The Model Detail node extracts and provides information about the model and algorithms.
To create a Model Details node, follow these steps:
Related Topics
Parent topic: Model Details Node
Edit Model Details Node
The Model Details Node editor enables you to view or specify the models details provided by the node.
Under the Selected Models section, you can view the models, the nodes, the algorithm and the partition keys. To open Edit Model Details Node, double-click a Model Details node. Alternately, right-click a Model Details node and select Edit.
You can perform the following tasks:
-
Auto Setting: If this option is selected (the default), then the system determines the specification. You cannot change the output types, algorithm types, or selected models.
-
Function: Displays the function type of the input nodes connected. For example, if a Classification node is connected to Model Details, then the function is Classification. If no input nodes are connected, then it is undefined.
-
Model Type: Displays the list of algorithms available, including
All.
Select a model type. -
Output: Select an output type for the Model Details of the algorithm. The options available are:
- If you select All or O-Clusters in the Model Type field, then available output types are
-
Attribute Histogram
-
Centroid
-
Centroid Scoring (available only for K-Means)
-
Full Tree
-
Model Signature
-
Rules
-
-
If you select Expectation Maximization, then available output types are:
-
Attribute Gaussian Distribution
-
Attribute Histogram
-
Centroid
-
Component Bernoulli Distribution
-
Component Clusters
-
Component Priors
-
Components
-
Full Tree
-
Global Details
-
Model Signature
-
Projections
-
Rules
-
-
If you select R Extensible, then available output types are:
-
Model Signature
-
R Model Details
-
- If you select All or O-Clusters in the Model Type field, then available output types are
-
Column: Click Columns to view the list of the columns (name and data type) for the selected output type.
-
Add: To add model type or edit output type, deselect Automatic Specification. To add another model type, select the model type and click . The Edit Model Details Node dialog box opens. You can accept the default specifications or edit them.
- Edit Model Selection Details
The Edit Model Selection Details provide generic information related to the mining function, model type, output type, available compatible models and selected models in two sections.
Related Topics
Parent topic: Model Details Node
Edit Model Selection Details
The Edit Model Selection Details provide generic information related to the mining function, model type, output type, available compatible models and selected models in two sections.
The top pane of the Edit Model Selection Details dialog box contains general information:
-
Function: Displays the function type of the input nodes connected. For example, if a Classification node is connected to Model Details, then the function is Classification. If no input nodes are connected, then it is undefined.
-
Model Type: Displays algorithms. If there are models already selected (listed in the Selected Models grid), then the Model Type field is disabled to match the already selected models. If you move all models out of the Selected Models grid, then the Model Type field is enabled again. If the Model Type is enabled, then you can select models. The default is
All Models.
-
Output Type: Displays the list of possible output types (model queries) that are available for the specified model types. The values for each algorithm selection are as follows:
-
Decision Tree (initial default): Full Tree (default), Full Tree XML, Leaf Nodes, Model Signature
-
SVM Classification: Coefficients (Default), Model Signature
-
SVM Regression, Coefficients (Default), Model Signature
-
Naive Bayes: Pair Probabilities (Default), Model Signature
-
Association Rules: Rules (Default), Global Details, Itemsets
-
Anomaly Detection: Coefficients (Default), Model Signature
-
GLM Classification: Statistics (Default), Row Diagnostics, Model Signature, Global Details
-
GLM Regression: Statistics (Default), Row Diagnostics, Model Signature, Global Details
-
KM or OC Clustering: Full Tree (Default), Rules, Attribute Histograms, Centroid, Model Signature
-
Expectation Maximization (EM): Full Tree (Default), Attribute Histograms, Centroid Components, Global Details, Model Signature, Projections, Rules.
EM requires Oracle Database 12c Release 1 (12.1) or later.
-
NMF: Features Transactional (Default), Model Signature
-
SVD: Features Transactional (Default), Global Details, Model Signature, Projections, Singular Values
SVD requires Oracle Database 12c Release 1 (12.1) or later.
-
PCA: Features Transactional (Default), Eigen Values, Global Details, Model Signature, Projections
PCA requires Oracle Database 12c Release 1 (12.1) or later.
-
Output values are also available for multiple model types. For example, you can select Centroid for all clustering models.
-
Columns: Click to see a list of the columns (name and data type) for the selected output type.
The lower section of the dialog box displays the following:
-
Available Compatible Models: Lists the available models, that is, models that match the algorithm selection. The grid, for each model, displays the model Name, the input node for the model, and the algorithm used to build the model.
-
Selected Models: Lists the selected models. The grid, for each model, displays the model name, the input node for the model, and the algorithm used to build the model.
Parent topic: Edit Model Details Node
Model Details Automatic Specification
The Automatic Specification setting determines how specifications change automatically.
-
By default, Automatic Specification is set to
ON
or selected. If Automatic Specification is set toON,
then it results in the following behavior:-
When the first input node is connected to a Model Details node, the input node is searched for models in a default order of priority. For the first model type found, all the nodes matching models are added to the Model Details Specification along with the default Output Type.
-
On subsequent connections, the models that match the type in the Model Details node are automatically added. A message is displayed telling you that models are being added automatically.
-
When an input node is disconnected, all model specifications provided by that node are automatically removed from the Model Details node.
-
When an input node is edited, any models added are automatically added to the Model Details node if the added model matches the model type contained in the node. If models are deleted from an input node, then they are deleted from the Model Details node.
-
When a parent node is edited so that all models are removed, the model node is set to undefined. When a new model is added to the parent node, the model node remains undefined because it is too unpredictable about what model and output type would be selected by default given that there may be many parent nodes connected to a model node.
-
When an input node is edited and the model is changed so that it is no longer consistent with its specification in the model details node, the model specification is removed.
-
-
If Automatic Specification is
Off
or deselected, then it results in the following behavior:-
Models are not added automatically.
-
You must edit the Model Details node.
-
Validations are performed as usual, so models that are now inconsistent or missing are marked as invalid. Also, if models are missing and a node is added that contains a match with that model, then it is made valid and associated to the new node.
-
You must manually fix or remove invalid model references.
-
- Default Model and Output Type Selection
The specification that is automatically added depends on the mining function of the model.
Related Topics
Parent topic: Model Details Node
Default Model and Output Type Selection
The specification that is automatically added depends on the mining function of the model.
The mining function of the model are as follows:
-
Classification
-
Decision Tree: Full Tree
-
GLM: Statistics
-
NB: Probabilities
-
SVM: LINEAR KERNEL ONLY Coefficients
-
-
Clustering
-
KM: Full Tree
-
OC: Full Tree
-
EM: Full Tree
-
-
Regression
-
GLM: Statistics
-
SVM: LINEAR KERNEL ONLY Coefficients
-
-
Anomaly Detection
-
SVM: LINEAR KERNEL ONLY Coefficients
-
-
Association
-
Apriori: Rules
-
-
Feature Extraction
-
NMF, SVD, or PCA: Features transactional
-
Parent topic: Model Details Automatic Specification
Model Details Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
Model Details node Properties has the following sections:
- Models (Model Details)
The Models section lists the models that you want to save details about. - Output (Model Details)
The Output tab lists the columns produced by the Model Details node. - Cache (Model Details)
You can generate cache. If you generate cache, then you can specify the sampling size. - Details
The Details section displays the node name and comments about the node.
Related Topics
Parent topic: Model Details Node
Models (Model Details)
The Models section lists the models that you want to save details about.
You can add and remove models from the list.
Parent topic: Model Details Node Properties
Output (Model Details)
The Output tab lists the columns produced by the Model Details node.
For each column, the alias (if any) and the data type are displayed.
Related Topics
Parent topic: Model Details Node Properties
Cache (Model Details)
You can generate cache. If you generate cache, then you can specify the sampling size.
The default is to not generate cache to optimize the viewing of results. The default sampling size is 2000
rows.
Parent topic: Model Details Node Properties
Details
The Details section displays the node name and comments about the node.
You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.
Related Topics
Parent topic: Model Details Node Properties
Model Details Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
To view the context menu options, right click the node. The following options are available in the context menu:
-
Edit. Opens the Edit Model Details Node dialog box.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed only if there is an error.
-
Show Validation Errors. Displayed only if there are validation errors.
- View Data (Model Details)
After a model is built and run successfully, you can view the data contained in the model using the View Data option.
Related Topics
Parent topic: Model Details Node
View Data (Model Details)
After a model is built and run successfully, you can view the data contained in the model using the View Data option.
To view the complete Model Details output, right-click the node and select View Data.
The output is displayed in the following tabs:
-
Data: The data that constitutes the model details. What the data represents depends on the model. For example, the data could represent a tree or rules. You can sort and filter the columns of this tab.
-
Columns: Data Type and Mining Type of the columns in the output.
-
SQL: SQL used to generate the model details.
Related Topics
Parent topic: Model Details Node Context Menu
Model Details Per Model
The exact data displayed in a Model Details node depends on the particular models.
All models that can be applied (scored) can have model signature as output.
Parent topic: Model Details Node
R Build Node
The R Build Node allows you to register R models. It builds R models and generates R model test results for Classification and Regression mining function. R Build nodes supports Classification, Regression, Clustering, and Feature Extraction mining functions only.
You must have Oracle R Enterprise installed in the host to build R models.
Note:
The R Model is visible only when Oracle SQL Developer is connected to Oracle Database 12.2 and later.- Create R Build Node
Create a R Build Node to register R models. - Edit R Build Node
The Edit R Build Node dialog box allows you to edit settings related to the R Model. - Advanced Settings (R Build Node)
The Advanced Settings dialog box allows you to view and edit model settings related to data usage, Extensible settings, and configuration of the previously defined R functions such as build function, scoring function, and model details function. - R Build Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - R Build Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Parent topic: Model Nodes
Create R Build Node
Create a R Build Node to register R models.
Parent topic: R Build Node
Edit R Build Node
The Edit R Build Node dialog box allows you to edit settings related to the R Model.
The dialog box comprises the following tabs:
- Build
In the Build tab enables you to specify or change the characteristics of the models to build. - Partition
In the Partition tab, you can build partitioned models. - Input
The Input tab specifies the input for model build. - Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size. - Text
Text is available for any of the following data types:CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
orNVARCHAR2.
Parent topic: R Build Node
Build
In the Build tab enables you to specify or change the characteristics of the models to build.
- Add Model (R Build Node)
You must provide R functions that are compatible with Oracle Data Mining Extensible framework. Otherwise runtime errors may result.
Parent topic: Edit R Build Node
Add Model (R Build Node)
You must provide R functions that are compatible with Oracle Data Mining Extensible framework. Otherwise runtime errors may result.
Note:
The required R functions must be registered using the scriptrqScriptCreate
in Oracle R Enterprise. For more information about the procedure, see Oracle R Enterprise User’s Guide.
- Build Function
In the Build Function dialog box, you can select any registered R function to be used for the build function. - Build Settings
The Build Settings dialog box allows you to specify the required settings with names, values, and data types. The names must match the argument names in the R function. The data types can be either NUMBER or STRING. - Score Function
In the Score Function dialog box, you can select a registered R function to be used for scoring. - Model Details Function
In the Model Details Function dialog box, you can select a registered R function.
Related Topics
Parent topic: Build
Build Function
In the Build Function dialog box, you can select any registered R function to be used for the build function.
- The Build Function field displays the applicable R build function. You can select another function from the drop-down list.
- The Function Definition field displays the code of the selected function. You can verify the function here. You can specify algorithm settings to be passed on to the build function.
- Click Settings. This opens the Build Settings dialog box where you can specify values for parameters used in the build function.
- Click OK.
Related Topics
Parent topic: Add Model (R Build Node)
Build Settings
The Build Settings dialog box allows you to specify the required settings with names, values, and data types. The names must match the argument names in the R function. The data types can be either NUMBER or STRING.
Parent topic: Add Model (R Build Node)
Score Function
In the Score Function dialog box, you can select a registered R function to be used for scoring.
Parent topic: Add Model (R Build Node)
Model Details Function
In the Model Details Function dialog box, you can select a registered R function.
- In the Model Details Function field, select the R function as applicable. If you do not specify the Model Details function, then the Details tab in the Model Viewer will not be available.
- The Function Definition section displays the code of the selected R function. You can verify the function here. The selected model detail function generates a data frame that is persisted to a view, after the model is built.
- In the Output Column section, you must specify the output signature of the function. The output signature of the function should match the data frame object generated by the function. For example, if you select a R function that produces an output two columns ATTRIBUTE and COEFFICIENTS, then the column data types can be either
NUMBER
orVARCHAR2.
Internally, Oracle Data Miner will construct aSELECT
statement from the specified name-value pairs to be passed to the R Model Details function using the ODM extensible framework. - Click OK.
Parent topic: Add Model (R Build Node)
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
- Add Partitioning Columns
Partitioning columns result in building a virtual model for each unique partition. Because the virtual model uses data only from a specific partition, it can potentially predict cases more accurately than if you did not select a partition.
Parent topic: Edit R Build Node
Add Partitioning Columns
Partitioning columns result in building a virtual model for each unique partition. Because the virtual model uses data only from a specific partition, it can potentially predict cases more accurately than if you did not select a partition.
In addition to selecting attributes, you can specify partitioning expressions. Partitioning expressions are concatenated and the result expression is the same for all predictive functions.
- Select one or more attributes in the Available Attributes list to serve as partitions.
- Move the selected columns to the Selected Attributes list using the arrows.
- Click OK. The attributes are moved to the Partition list.
Optionally, you can add partitioning expressions.
Parent topic: Partition
Input
The Input tab specifies the input for model build.
Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.
Note:
For R Build nodes, Auto Data Preparation is not performed.After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.
You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).
Related Topics
Parent topic: Edit R Build Node
Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.
By default, Sampling is set to OFF.
To set it to ON:
Parent topic: Edit R Build Node
Text
Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
or NVARCHAR2.
If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.
If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.
Note:
If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.
To examine or specify text characteristics for data mining, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.
The Text tab enables you to modify the following:
-
Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is
200.
-
Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:
-
Token (Default): For Token as the transform type, the Default Settings are:
-
Languages: Specifies the languages used in the documents. The default is
English.
To change this value, select an option from the drop-down list. You can select more than one language. -
Bigram: Select this option to mix the
NORMAL
token type with their bigram. For example, New York. The token type isBIGRAM.
-
Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.
Note:
If both Bigram and Stemming are selected, then the token type is
STEM_BIGRAM.
If neither Bigram nor Stemming is selected, then token type isNORMAL.
-
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.-
Click to view the Stoplist Details. This opens the Stoplist Details dialog box.
-
Click to add a new stoplist. This opens the New Stoplist Wizard.
-
-
Tokens: Specify the following:
-
Max number of tokens across all rows (document). The default is
3000.
-
Min number of rows (document) required for a token
-
-
-
Theme: If Theme is selected, then the Default Settings are as follows:
-
Language: Specifies the languages used in the documents. The default is
English.
To change this value, select one from the drop-down list. You can select more than one language. -
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added. -
Themes: Specifies the maximum number of themes across all documents. The default is
3000.
-
-
Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.
-
-
Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.
You can use the same stoplist for all text columns.
Related Topics
Parent topic: Edit R Build Node
Advanced Settings (R Build Node)
The Advanced Settings dialog box allows you to view and edit model settings related to data usage, Extensible settings, and configuration of the previously defined R functions such as build function, scoring function, and model details function.
You can perform the following tasks:
-
Add Model: Click to add a model.
-
Delete Model: Select a model and click .
Parent topic: R Build Node
R Build Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
Parent topic: R Build Node
R Build Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
The following options are available in the context menu:
-
Edit. This opens the Edit R Build Node dialog box.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
Parent topic: R Build Node
Regression Node
The Regression node defines one or more Regression models to build and to test.
To specify data for the build, connect a Data Source node to the Regression node. You can also connect a second data source to the Regression build node to specify test data. You can only specify one target. A Regression build can run in parallel.
The models in a Regression Node all have the same target and case ID.
There are two ways to make regression predictions:
-
By building and testing a Regression model: Use a Regression node, and then apply the model to new data to make classifications.
-
By using a Prediction Query, which is one of the predictive queries.
This section consists of the following topics:
- Default Behavior for Regression Node
For a binary target, the Regression node builds four models. - Create a Regression Node
By default, a Regression node builds two models, one each using General Linear Model (GLM) and Support Vector Machine (SVM) algorithm. - Data for Model Build
Oracle Data Miner uses heuristic techniques on data for model build. - Edit Regression Build Node
In the Edit Regression Build Node dialog box, you can edit settings related to the model build, model partition, sampling, inputs, text settings and so on. - Advanced Settings for Regression Models
In the Advanced Settings dialog box, you can add models, delete models, review settings, and change settings related to the model and algorithm. - Regression Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Regression Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Related Topics
Parent topic: Model Nodes
Default Behavior for Regression Node
For a binary target, the Regression node builds four models.
The models are built using the following algorithms:
-
Generalized Linear Model (GLM)
-
Support Vector Machine (SVM)
The models have the same build data and the same target.
By default, the models are all tested. The test data is created by randomly splitting the build data into a build data set and a test data set. The default ratio for the split is 60 percent build and 40 percent test. When possible Data Miner uses compression when creating the test and build data sets.
You can instead use all the build data as test data.
To use separate test data, connect a test data source to the Build node or use a Test node.
After you test models, you can view test results.
You can compare test results for two or more Regression models using the Compare Test Results selection of the context menu.
The case ID is optional. However, if you do not specify a case ID, then the processing will be slower.
Parent topic: Regression Node
Create a Regression Node
By default, a Regression node builds two models, one each using General Linear Model (GLM) and Support Vector Machine (SVM) algorithm.
Related Topics
Parent topic: Regression Node
Data for Model Build
Oracle Data Miner uses heuristic techniques on data for model build.
Oracle Data Miner uses heuristics to:
-
Determine the attributes of the input data used for model build.
-
Determine the mining type of each attribute.
Related Topics
Parent topic: Regression Node
Edit Regression Build Node
In the Edit Regression Build Node dialog box, you can edit settings related to the model build, model partition, sampling, inputs, text settings and so on.
To open the Edit Regression Build Node dialog box, double-click a Regression Build node, or right-click a Regression Build node and select Edit.
The Edit Regression Build Node dialog box contains the following tabs:
- Build
The Build tab enables you to specify or change the characteristics of the models to build. - Partition
In the Partition tab, you can build partitioned models. - Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size. - Input
The Input tab specifies the input for model build. - Text
Text is available for any of the following data types:CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
orNVARCHAR2.
Related Topics
Parent topic: Regression Node
Build
The Build tab enables you to specify or change the characteristics of the models to build.
To edit the characteristics of the model to build, follow these steps:
The default is to test the model using a test data set created by splitting the build data set. If you do not want to test the model in this way, then go to the Test section in the Regression node Properties pane. You can also use a Test Node and a test Data Source node to test the model instead.
- Add Model (Regression)
In the Add Model dialog box, you can add a model to the node, and select an algorithm for it.
Related Topics
Parent topic: Edit Regression Build Node
Add Model (Regression)
In the Add Model dialog box, you can add a model to the node, and select an algorithm for it.
To add a model to the node:
- In the Algorithm field, select an algorithm.
- In the Name field, a default name is displayed. You can use the default or rename the model.
- In the Comment field, add comments if any. This is an optional field.
- Click OK. The new model is added to the node.
Parent topic: Build
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
Parent topic: Edit Regression Build Node
Sampling
The settings in the Sampling tab are applied to all models in the node. In the Sampling tab, you can specify the row size.
By default, Sampling is set to OFF.
To set it to ON:
Parent topic: Edit Regression Build Node
Input
The Input tab specifies the input for model build.
Determine inputs automatically (using heuristics) is selected by default for all models. Oracle Data Miner decides which attributes to use for input. For example, attributes that are almost constant may not be suitable for input. Oracle Data Miner also determines mining type and specifies that auto data preparation is performed for all attributes.
Note:
For R Build nodes, Auto Data Preparation is not performed.After the node runs, rules are displayed explaining the heuristics. Click Show for detailed information.
You can change these selections. To do so, deselect Determine inputs automatically (using heuristics).
Related Topics
Parent topic: Edit Regression Build Node
Text
Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR,
or NVARCHAR2.
If you are connected to Oracle Database 12c Release 1 (12.1) and later, then you can specify text characteristics in the Text tab in the Edit Model Build dialog box.
If you specify text characteristics in the Text tab, then you are not required to use the Text nodes.
Note:
If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, then use Text nodes. The Text tab is not available in Oracle Database 11g Release 2 and earlier.
To examine or specify text characteristics for data mining, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.
The Text tab enables you to modify the following:
-
Categorical cutoff value: Enables you to control the cutoff used to determine whether a column should be considered as a Text or Categorical mining type. The cutoff value is an integer. It must be 10 or greater and less than or equal to 4000. The default value is
200.
-
Default Transform Type: Specifies the default transformation type for column-level text settings. The values are:
-
Token (Default): For Token as the transform type, the Default Settings are:
-
Languages: Specifies the languages used in the documents. The default is
English.
To change this value, select an option from the drop-down list. You can select more than one language. -
Bigram: Select this option to mix the
NORMAL
token type with their bigram. For example, New York. The token type isBIGRAM.
-
Stemming: By default, this option is not selected. Not all languages support stemming. If the language selected is English, Dutch, French, German, Italian, or Spanish, then stemming is automatically enabled. If Stemming is enabled, then stemmed words are returned for supported languages. Otherwise the original words are returned.
Note:
If both Bigram and Stemming are selected, then the token type is
STEM_BIGRAM.
If neither Bigram nor Stemming is selected, then token type isNORMAL.
-
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language, and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist from the repository. No duplicate stop words are added.-
Click to view the Stoplist Details. This opens the Stoplist Details dialog box.
-
Click to add a new stoplist. This opens the New Stoplist Wizard.
-
-
Tokens: Specify the following:
-
Max number of tokens across all rows (document). The default is
3000.
-
Min number of rows (document) required for a token
-
-
-
Theme: If Theme is selected, then the Default Settings are as follows:
-
Language: Specifies the languages used in the documents. The default is
English.
To change this value, select one from the drop-down list. You can select more than one language. -
Stoplists: Specifies the stoplist to use. The default setting is to use the default stoplist. You can add stoplists or edit stoplists. If you select more than one language and the selected stoplist is
Default,
then the default stop words for languages are added to the default stoplist (from the repository). No duplicate stop words are added. -
Themes: Specifies the maximum number of themes across all documents. The default is
3000.
-
-
Synonym: The Synonym tab is enabled only if a thesaurus is loaded. By default, no thesaurus is loaded. You must manually load the default thesaurus provided by Oracle Text or upload your own thesaurus.
-
-
Click Stoplists to open the Stoplist Editor. You can view, edit, and create stoplists.
You can use the same stoplist for all text columns.
Related Topics
Parent topic: Edit Regression Build Node
Advanced Settings for Regression Models
In the Advanced Settings dialog box, you can add models, delete models, review settings, and change settings related to the model and algorithm.
The Advanced Settings dialog box enables you to:
-
Inspect and change data usage and algorithm settings for each model in the node
-
Add and delete models
To change or view Advanced Settings, click in the Edit Regression Build Node dialog box. Alternately, right-click the node and select Advanced Settings.
The upper panes lists all the models in the node. You can perform the following functions:
-
Delete: To delete a model, select the model and click
-
Add: To add a model, click . The Add Model dialog box opens.
In the lower pane, you can view and modify data usage and algorithm settings for the model selected in the upper pane. You can edit the following:
-
The settings that can be changed depend on the algorithm:
Related Topics
Parent topic: Regression Node
Regression Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
Before building Regression models, ensure the following:
-
Specify a Target.
-
Specify a case ID. This is optional. However, if you do not specify a case ID, then the processing will be slower.
This section contains the following topics:
- Models (Regression)
The Model section lists the models that are built. - Build (Regression)
The Build section displays information related to the selected target and the Case ID. - Partition
In the Partition tab, you can build partitioned models. - Details
The Details section displays the node name and comments about the node. - Test (Regression)
The Test section specifies the data used for testing and the tests performed.
Parent topic: Regression Node
Models (Regression)
The Model section lists the models that are built.
By default, three Regression models are built using three different algorithms (SVM, NB, and DT). You can also specify the GLM algorithm if you add a model.
You can perform the following tasks:
-
Delete: To delete a model, select the model and click .
-
Add: To add a model, click
-
Compare Test Results: If models were tested, then you can compare test results. Select two or more models and click
-
View Models: If a model built successfully, then you can view the model. Select the model and click . The corresponding viewer opens.
-
Indicate Model Status: Indicates whether models are passed to subsequent nodes.
- Output Column
The Output column in the Model Settings grid controls the passing of models to subsequent nodes. By default, all models are passed to subsequent nodes.
Related Topics
Parent topic: Regression Node Properties
Output Column
The Output column in the Model Settings grid controls the passing of models to subsequent nodes. By default, all models are passed to subsequent nodes.
-
To ignore a model, that is, to not pass it to subsequent nodes, click . The icon changes to , the Ignore icon.
-
To cancel the ignore, click the Ignore icon again. It changes to the Output icon.
Parent topic: Models (Regression)
Build (Regression)
The Build section displays information related to the selected target and the Case ID.
The Build section displays the following:
-
Target: The Build node must be connected to a Data Source node. You then select the target from the target list. To change the target, select a different target from the drop-down list.
-
Case ID: Select an attribute from the drop-down list. This attribute must uniquely identify a case. The case ID is optional. If no case ID is selected, then
None
is displayed. However, if no case ID is specified, then the processing will be slower.
Parent topic: Regression Node Properties
Partition
In the Partition tab, you can build partitioned models.
-
In the Maximum Number of Partitions field, set a value by clicking the arrows. This sets the partition cutoff value. The cutoff value must be greater than zero. If this option is not selected, then the native Oracle Data Miner cutoff value is used, which can be a very large value.
-
Click Advanced Settings to set and select the type of partition build.
-
To add columns for partitioning, click .
Note:
Only NUMBER and VARCHAR2 columns can be selected as partition columns. Case ID and Target columns cannot be selected as partition columns.
-
To remove a partitioning column, select the columns and click .
-
To move a column to the top, click .
-
To move a column up, click
-
To move a column down, click
-
To move a column to the bottom, click
Parent topic: Regression Node Properties
Details
The Details section displays the node name and comments about the node.
You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.
Related Topics
Parent topic: Regression Node Properties
Test (Regression)
The Test section specifies the data used for testing and the tests performed.
By default, all models that are built using test data are tested. The test data is created randomly splitting the build data.
The following settings are available in the Test section:
-
Perform Test: By default, all models that are built using test data are tested. The test data is created randomly splitting the build data. The default test results are:
-
Performance Metrics
-
Residuals
You can deselect both.
-
-
Test Data: Test Data is created is one of the following ways:
-
Use all of the Mining Build Data for Testing
-
Use Split Build Data for TestingSplit for Test (%) Create Split as:
View
(default). The split creates a view that is not parallel. -
Use a Test Data Source for Testing: Select this option to provide a separate test Data Source and connect the test data source to the build node after you connect the build data. Alternately, you can test a model by using a Test node.
-
Related Topics
Parent topic: Regression Node Properties
Regression Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
To view the context menu options, right click the node. The following options are available in the context menu:
-
Edit: Opens the Edit Regression Build Node dialog box.
-
Advanced Settings: Opens the Advanced Settings for Regression Models dialog box.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors, displayed only if there is an error
-
Show Validation Errors, displayed only if there are validation errors
Parent topic: Regression Node
Advanced Settings Overview
The Advanced Settings dialog box enables you to edit data usage and other model specifications, add and remove models from the node.
You can open the Advanced Settings dialog box in one of these ways:
-
Right-click any model node and click Advanced Settings from the context menu.
-
Double-click the node to open the editor. Then click .
In the upper pane of the Advanced Settings, you can delete and add models. You can also select models in the upper pane to change data usage. In the lower pane of the Advanced Settings, which has one, two, or three tabs, you can edit model specifications.
- Upper Pane of Advanced Settings
The upper pane of Advanced Settings lists all of the models in the node. - Lower Pane of Advanced Settings
The lower pane of Advanced Settings displays information related to data usage, algorithm settings, and performance settings.
Parent topic: Model Nodes
Upper Pane of Advanced Settings
The upper pane of Advanced Settings lists all of the models in the node.
The Model Settings grid provides the following information about each model:
-
Model Name
-
Algorithm
-
Date of Last Build
-
Auto
-
Data Usage
-
Column Excluded By...
To view the input and mining type for attributes, select the model in the upper pane and deselect Auto. If Auto is selected (the default), then the system automatically determines the attributes used to build the model.
Oracle Data Miner does not necessarily select all attributes to use in building a model. For example, if most of values of an attribute are the same, then the attribute is not selected.
To see which attributes are selected, deselect Auto. Select a model. The lower pane indicates the selected attributes with a check mark in the Input column.
If Auto is not selected, then you can override the system's choices in the Data Usage tab. If Auto is not selected, then you can also view input and mining type. This enables you to see which attributes are used for model build, and to change them, if necessary.
The Model Settings grid enables you to delete or add models to the node.
-
Delete: To delete a model, select the model and click .
-
Add: To add a model to the node, click . The Add Models dialog box for the node opens. In the Add Model dialog box, select an algorithm, either accept the default name or specify a different name, and add optional comments.
Related Topics
Parent topic: Advanced Settings Overview
Lower Pane of Advanced Settings
The lower pane of Advanced Settings displays information related to data usage, algorithm settings, and performance settings.
Select a model in the upper pane. The related information is displayed in the following tabs:
-
Data Usage: For all models except Association
-
Algorithm Settings: For all models
-
Performance Settings: For Classification models only
These tabs display the specification used to build the selected model. You can change the specification.
- Data Usage
The Data Usage tab contains the data grid that lists all attributes in the data source. - Algorithm Settings
The Algorithm Settings section displays the values of algorithm settings. - Performance Settings
The performance settings are available for Classification models only.
Parent topic: Advanced Settings Overview
Data Usage
The Data Usage tab contains the data grid that lists all attributes in the data source.
The Data Usage tab is not supported for the Association node. To modify any values, to see which attributes are not used as input, or to see mining types, select View in the lower pane.
You can change data usage information for several models at the same time. For each attribute, the grid lists displays the following:
-
Attributes: This is the name of the attribute.
-
Data Type: This is the Oracle Database data type of the attribute.
-
Input: Indicates if the attribute is used to build the model. To change the input type, click Automatic. Then click the icon and select the new icon. For models that have a target, such as Classification and Regression models, the target is marked with a red target icon.
-
The icon indicates that the attribute is used to build the model.
-
The icon indicates that the attribute is ignored, that is, it is not used to build the model.
-
-
Mining Type: This is the logical type of the attribute, either Numerical (numeric data), Categorical (character data), nested numerical, or nested categorical, text or custom text. If the attribute has a type that is not supported for mining, then the column is blank. Mining type is indicated by an icon. Move the cursor over the icon to see what the icon represents. To change the mining type, click Automatic and then click the type for the attribute. Select a new type from the list. You can change mining types as follows:
-
Numerical can be changed to Categorical. Changing to Categorical casts the numerical value to string.
-
Categorical.
-
Nested Categorical and Nested Numerical cannot be changed.
-
-
Auto Prep: If Auto Prep is selected, then automatic data preparation is performed on the attribute. If Auto Prep is not selected, then no automatic data preparation is performed for the attribute. In this case, you are required to perform any data preparation, such as normalization, that may be required by the algorithm used to build the model. No data preparation is done (or required) for target attributes. The default is to perform automatic data preparation.
-
Rules: After a model runs, Rules describe the heuristics used. For details, click Show.
There are two types of reasons for not selecting an attribute as input:
-
The attribute has a data type that is not supported by the algorithm used for model build.
For example, O-Cluster does not support nested data types such as
DM_NESTED_NUMERICALS
. If you use an attribute with typeDM_NESTED_NUMERICALS
to build a O-Cluster model, then the build fails. -
The attribute does not provide data useful for mining. For example, an attribute that has constant or nearly constant values.
If you include attributes of this kind, then the model has lower quality than if you exclude them.
Related Topics
Parent topic: Lower Pane of Advanced Settings
Algorithm Settings
The Algorithm Settings section displays the values of algorithm settings.
The settings are determined by the algorithm used to build the model.
Parent topic: Lower Pane of Advanced Settings
Performance Settings
The performance settings are available for Classification models only.
The Performance Settings tab defines the performance objective for Classification model build. To view or change performance settings for a model, select the model in the upper pane. Weights are listed in the Weights grid. Select one of these settings:
-
Balanced: (default) Attempts to achieve the best overall accuracy across all the target class values. This is done in different ways depending on the algorithm selected. Generally, it requires the model build process to be biased using weight values that provide extra weight to target values that occur less frequently.
-
Natural: Enables the model to build without any bias, so that the model uses its natural view of the data to build an accurate model. In this case, rare target class values are probably not going to be predicted as frequently as they would predict the model that was built using the balanced option.
-
Custom: Enables you to enter a set of weights for each target value. One way to get started defining custom weights is to click Balanced or Natural , just above the Weights grid. Either of these options generate weights similar to those that would result in either Balanced or Natural performance. You can then change these weights to different values.
To save the values, click OK.
Related Topics
Parent topic: Lower Pane of Advanced Settings
Mining Functions
Mining functions represent a class of mining problems that can be solved using data mining algorithms.
When creating a data mining model, you must first specify the mining function and then choose an appropriate algorithm to implement the function if one is not provided by default.
Oracle Data Mining supports these mining functions:
- Classification
Classification is a data mining function that assigns items in a collection to target categories or classes, that is, items are classified according to target categories. - Regression
Regression is a data mining function that predicts a number. Profit, sales, mortgage rates, house values, square footage, temperature, or distance could all be predicted using regression techniques. - Anomaly Detection
Anomaly Detection (AD) identifies cases that are unusual within data that is apparently homogeneous. - Clustering
Clustering finds natural groupings of data objects, that is objects that are similar in some sense to one another. - Association
Association rules express the relationships between items that take place at the same time. - Feature Extraction and Selection
The Feature Extraction mining function combines attributes into a new reduced set of features. The Feature Selection mining function selects the most relevant attributes.
Parent topic: Model Nodes
Classification
Classification is a data mining function that assigns items in a collection to target categories or classes, that is, items are classified according to target categories.
fication The goal of classification is to accurately predict the target class for each case in the data. For example, a Classification model could be used to identify loan applicants as low, medium, or high credit risks.
The target categories for a classification are discrete and not ordered. The simplest type of classification problem is binary classification. In binary classification, the target attribute has only two possible values: for example, high credit rating or low credit rating. Multiclass targets have more than two values: for example, low, medium, high, or unknown credit rating.
The following topics describe the classification:
- Building Classification Models
A Classification model is built from historical data for which the classifications are known. - Comparing Classification Models
You can compare Classification models by comparing the test metrics of the respective models. - Applying Classification Models
Scoring or applying a Classification model results in class assignments and the probability that the assignment is the correct one. - Classification Algorithms
Decision Tree algorithm, Naive Bayes algorithm, and Generalized Linear Model algorithms are used for classification.
Parent topic: Mining Functions
Building Classification Models
A Classification model is built from historical data for which the classifications are known.
To build (train) a Classification model, a classification algorithm finds relationships between the values of the predictors and the values of the target. Different classification algorithms use different techniques for finding relationships. These relationships are summarized in a model. The model can then be applied to a different data set in which the class assignments are unknown.
Algorithm settings control model build. Settings depend on the algorithm.
Use a Build Node to build one or more Classification models.
Classification models are tested by default.
Related Topics
Parent topic: Classification
Comparing Classification Models
You can compare Classification models by comparing the test metrics of the respective models.
Parent topic: Classification
Applying Classification Models
Scoring or applying a Classification model results in class assignments and the probability that the assignment is the correct one.
For example, a model that classifies customers as low, medium, or high value would also predict the probability that the classification is correct.
Use an Apply node to score a Classification model, that is to apply the model to new data.
Related Topics
Parent topic: Classification
Classification Algorithms
Decision Tree algorithm, Naive Bayes algorithm, and Generalized Linear Model algorithms are used for classification.
-
Decision Tree algorithm automatically generates rules, which are conditional statements that reveal the logic used to build the tree.
-
Naive Bayes algorithm uses Bayes' Theorem, a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data.
-
Generalized Linear Models (GLM) algorithm is a popular statistical technique for linear modeling. Oracle Data Mining implements GLM for binary classification and for regression.
GLM provides extensive coefficient statistics and model statistics, and row diagnostics. GLM also supports confidence bounds, which are the upper and lower boundaries of an interval in which the predicted value is likely to lie.
-
Support Vector Machine (SVM) algorithm is a powerful, state-of-the-art algorithm based on linear and non-linear regression. Oracle Data Mining implements SVM for binary and multiclass classification.
Oracle Data Mining implements SVM for binary and multiclass classification.
Related Topics
Parent topic: Classification
Regression
Regression is a data mining function that predicts a number. Profit, sales, mortgage rates, house values, square footage, temperature, or distance could all be predicted using regression techniques.
For example, a Regression model could be used to predict the value of a house based on location, number of rooms, lot size, and other factors.
This section on Regression contains the following topics:
Regression models are tested by default.
- Building Regression Models
Use a Build Node to build one or more Regression models. - Applying Regression Models
Scoring, or applying, a Regression model results in class assignments and the probability that the assignment is correct for each case. - Regression Algorithms
Oracle Data Mining supports Generalized Linear Models (GLM) and Support Vector Machines (SVM) for Regression.
Related Topics
Parent topic: Mining Functions
Building Regression Models
Use a Build Node to build one or more Regression models.
Algorithm settings control the model build. Settings depend on the algorithm.
A Regression task begins with a data set in which the target values are known. For example, a Regression model that predicts house values could be developed based on observed data for many houses over a period of time. In addition to the value, the data might track the age of the house, square footage, number of rooms, taxes, school district, proximity to shopping centers, and so on. House value would be the target, the other attributes would be the predictors, and the data for each house would constitute a case.
In the model build (training) process, a regression algorithm estimates the value of the target as a function of the predictors for each case in the build data. These relationships between predictors and target are summarized in a model, which can then be applied to a different data set in which the target values are unknown.
Parent topic: Regression
Applying Regression Models
Scoring, or applying, a Regression model results in class assignments and the probability that the assignment is correct for each case.
For example, a model that predicts a value for each cased also predicts the probability that the value is correct.
Use an Apply node to score a Regression model, that is to apply the model to new data.
Related Topics
Parent topic: Regression
Regression Algorithms
Oracle Data Mining supports Generalized Linear Models (GLM) and Support Vector Machines (SVM) for Regression.
-
Generalized Linear Models (GLM) algorithm is a popular statistical technique for linear modeling. Oracle Data Mining implements GLM for binary classification and for regression.
GLM provides extensive coefficient statistics and model statistics, and row diagnostics. GLM also supports confidence bounds.
-
Support Vector Machines (SVM) algorithm is a powerful, state-of-the-art algorithm based on linear and non-linear regression.
SVM regression supports two kernels: the Gaussian Kernel for non-linear regression, and the Linear Kernel for linear regression. SVM also supports active learning.
Related Topics
Parent topic: Regression
Anomaly Detection
Anomaly Detection (AD) identifies cases that are unusual within data that is apparently homogeneous.
Standard classification algorithms require the presence of both positive and negative examples (counterexamples) for a target class. One-Class Support Vector Machine (SVM) classification requires only the presence of examples of a single target class.
-
Positive, if an example belongs to a set, and
-
Negative or zero, if the example belongs to the complement of the set
Note:
Solving a one-class classification problem can be difficult. The accuracy of one-class classifiers cannot usually match the accuracy of standard classifiers built with meaningful counterexamples.
This section about Anomaly Detection models contains the following topics:
- Building Anomaly Detection Models
Oracle Data Mining uses SVM as the one-class classifier for Anomaly Detection (AD). - Applying Anomaly Detection Models
Oracle Data Mining uses Support Vector Machine (SVM) as the one-class classifier for Anomaly Detection (AD). When a one-class SVM model is applied, it produces a prediction and a probability for each case in the scoring data.
Parent topic: Mining Functions
Building Anomaly Detection Models
Oracle Data Mining uses SVM as the one-class classifier for Anomaly Detection (AD).
When SVM is used for Anomaly Detection, it has the Classification mining function but no target.
To build an AD model, use an Anomaly Detection node connected to an appropriate data source.
Related Topics
Parent topic: Anomaly Detection
Applying Anomaly Detection Models
Oracle Data Mining uses Support Vector Machine (SVM) as the one-class classifier for Anomaly Detection (AD). When a one-class SVM model is applied, it produces a prediction and a probability for each case in the scoring data.
-
If the prediction is
1,
then the case is considered typical. -
If the prediction is
0,
then the case is considered anomalous.
This behavior reflects the fact that the model is trained with normal data.
Parent topic: Anomaly Detection
Clustering
Clustering finds natural groupings of data objects, that is objects that are similar in some sense to one another.
The members of a cluster are more like each other than they are like members of other clusters. The goal of clustering analysis is to find high-quality clusters such that the inter-cluster similarity is low and the intra-cluster similarity is high.
The following topics discuss clustering:
- Using Clusters
You can use Clustering to segment data, to explore data, and also for anomaly detection. - Calculating Clusters
Oracle Data Mining performs hierarchical clustering. - Algorithms for Clustering
Oracle Data Mining supports these algorithms for clustering:
Parent topic: Mining Functions
Using Clusters
You can use Clustering to segment data, to explore data, and also for anomaly detection.
Like Classification, use Clustering to segment data. Unlike Classification, Clustering models segment data into groups that were not previously defined. Classification models segment data by assigning it to previously defined classes, which are specified in a target. Clustering models do not use a target.
Clustering is useful for exploring data. If there are many cases and no obvious groupings, then you can use clustering algorithms to find natural groupings. Clustering can also serve as a useful data preprocessing step to identify homogeneous groups on which to build supervised models.
Clustering can also be used for anomaly detection. After the data has been segmented into clusters, you might find that some cases do not fit well into any clusters. These cases are anomalies or outliers.
Clusters are not necessarily disjoint; an item can be in several clusters.
Parent topic: Clustering
Calculating Clusters
Oracle Data Mining performs hierarchical clustering.
The leaf clusters are the final clusters generated by the algorithm. Clusters higher up in the hierarchy are intermediate clusters.
Parent topic: Clustering
Algorithms for Clustering
Oracle Data Mining supports these algorithms for clustering:
-
Expectation Maximization. Requires Oracle Database 12c Release 1 (12.1) or later.
Parent topic: Clustering
Association
Association rules express the relationships between items that take place at the same time.
Association rules are often used to analyze sales transactions. For example, it might be noted that customers who buy cereal at the grocery store often buy milk at the same time. In fact, association analysis might find that 85 percent of the checkout sessions that include cereal also include milk.
This application of association modeling is called market-basket analysis. It is valuable for direct marketing, sales promotions, and for discovering business trends. Market-basket analysis can also be used effectively for store layout, catalog design, and cross-sell.
Association modeling has important applications in other domains as well. For example, in e-commerce applications, association rules may be used for web page personalization. An association model might find that a user who visits pages A and B is 70 percent likely to also visit page C in the same session. Based on this rule, a dynamic link could be created for users who are likely to be interested in page C.
Association modeling analyzes data that consists of transactions.
- Transactions
In transactional data, a collection of items is associated with each case. A case consists of a transaction such as a market-basket or web session.
Parent topic: Mining Functions
Transactions
In transactional data, a collection of items is associated with each case. A case consists of a transaction such as a market-basket or web session.
The collection of items in the transaction is an attribute of the transaction. Other attributes might be the date, time, location, or user ID associated with the transaction. However, in most cases, only a tiny subset of all possible items are present in a given transaction. The items in the market-basket represent only a small fraction of the items available for sale in the store. Association is transaction-based.
When an item is not present in a collection, it may have a null value or it may be missing. Many of the items may be missing or null, because many of the items that could be in the collection are probably not present in any individual transaction.
Parent topic: Association
Feature Extraction and Selection
The Feature Extraction mining function combines attributes into a new reduced set of features. The Feature Selection mining function selects the most relevant attributes.
Sometimes too much information can reduce the effectiveness of data mining. Some columns of data attributes assembled for building and testing a model may not contribute meaningful information to the model. Some may actually detract from the quality and accuracy of the model.
Irrelevant attributes add noise to the data and affect model accuracy. Irrelevant attributes also increases the size of the model and the time and system resources needed for model building and scoring.
- Feature Selection
Feature Selection ranks the existing attributes according to their predictive significance - Feature Extraction
Feature Extraction is an attribute reduction process.
Parent topic: Mining Functions
Feature Selection
Feature Selection ranks the existing attributes according to their predictive significance
Finding the most significant predictors is the goal of some data mining projects. For example, a model might seek to find the principal characteristics of clients who pose a high credit risk.
Attribute Importance is also useful as a preprocessing step in classification modeling. Decision Tree and Generalized Linear Models benefit from this type of preprocessing. Oracle Data Mining implements Feature Selection for optimization within both of these algorithms
Oracle Data Miner provides the Attribute Importance setting in the Filter Columns node transformation to identify important features using the Oracle Data Mining importance function.
Related Topics
Parent topic: Feature Extraction and Selection
Feature Extraction
Feature Extraction is an attribute reduction process.
Unlike Feature Selection, which ranks the existing attributes according to their predictive significance, Feature Extraction actually transforms the attributes. The transformed attributes, or features, are linear combinations of the original attributes.
The Feature Extraction process results in a much smaller and richer set of attributes. The maximum number of features may be user-specified or determined by the algorithm. By default, it is determined by the algorithm.
Oracle Data Mining supports these algorithms for Feature Extraction:
-
Singular Value Decomposition and Principal Components Analysis. Requires Oracle Database 12c Release 1 (12.1) or later.
Parent topic: Feature Extraction and Selection