Algorithms

This appendix covers the following topics:

Overview of Algorithms

Select an algorithm to use when creating a model. The algorithm selected uses the model features selected to predict a specific target measure. The output from each model varies depending on the algorithm and model features selected. The following algorithm descriptions provide a general overview of each algorithm choice for each model type.

Feature Significance Model Algorithms

When creating a Feature Significance model, you can choose from either of the following algorithms:

Random Forest Algorithm

For a detailed explanation of the Random Forest algorithm, see "Random Forest" in Oracle Data Mining Concepts and visit the Apache Spark website at Random Forests.

You can set the following parameters for a random forest algorithm:

For Random Forest algorithm examples, visit the Apache Spark website at Random Forests.

Chi-Square Algorithm

The Chi-Square test is used to test the independence of two events. More specifically, to determine feature significance, the Chi-Square test is used to test whether the occurrence of a specific term (input feature) and the occurrence of a specific class (output variable) are independent. The algorithm generates a p-value which determines how likely or unlikely it is to have a NULL hypothesis (where the input and output are completely independent). The lower the p-value, the more unlikely the NULL hypothesis, indicating a relationship between the input and the output variables. Higher p-values indicate the likelihood of a NULL hypothesis (no relationship). You must specify the acceptable minimum importance for the model. Features with an importance above the set minimum limit are not considered significant to the quality or yield result. The system ranks all features with a p-value below this threshold, with the lowest p-value ranked the highest.

Features are ranked as follows:

Minimum Importance (p-value) Strength of Relationship
0 to 0.01 Very Strong
0.01 to 0.05 Strong
0.05 to Minimum Importance (default value = 0.1) Weak
> Minimum Importance No relationship

For more details about the Chi-Square algorithm, visit the Apache Spark website at Hypothesis testing.

Insight Model Algorithms

When creating an Insight model, you can choose either of the following algorithms:

Apriori Algorithm

Association rules set the minimum level of predictability that is acceptable for this algorithm and data set. For a detailed explanation of the Apriori algorithm, see "Apriori" in Oracle Data Mining Concepts. Enter acceptable values for the following three parameters:

Decision Tree Algorithm

For a detailed explanation of the Decision Tree algorithm, see "Decision Tree" in Oracle Data Mining Concepts.

You can set the following parameters for a decision tree algorithm:

Predictions Model Algorithms

When creating a Predictions model, you can choose from either of the following algorithms:

Decision Tree Algorithm

For a detailed explanation of the Decision Tree algorithm, see "Decision Tree" in Oracle Data Mining Concepts.

You can set the following parameters for a decision tree algorithm:

SVM Algorithm

For a detailed explanation of the SVM algorithm, see "Support Vector Machines" in Oracle Data Mining Concepts.

Specify the following SVM parameters: