How Do I Choose a Predictive Model Algorithm?
Oracle Analytics provides algorithms for any of your machine learning modeling needs: numeric prediction, multi-classifier, binary classifier, and clustering.
Oracle's machine learning functionality is for advanced data analysts who have an idea of what they're looking for in their data, are familiar with the practice of predictive analytics, and understand the differences between algorithms.
Note:
If you're using data sourced from Oracle Autonomous AI Lakehouse, you can use the AutoML capability to quickly and easily train a predictive model for you, without requiring machine learning skills. See Train a Predictive Model Using AutoML in Autonomous Data Warehouse.Normally users want to create multiple prediction models, compare them, and choose the one that's most likely to give results that satisfy their criteria and requirements. These criteria can vary. For example, sometimes users choose models that have better overall accuracy, sometimes users choose models that have the least type I (false positive) and type II (false negative) errors, and sometimes users choose models that return results faster and with an acceptable level of accuracy even if the results aren't ideal.
Oracle Analytics contains multiple machine learning algorithms for each kind of prediction or classification. With these algorithms, users can create more than one model, or use different fine-tuned parameters, or use different input training datasets and then choose the best model. The user can choose the best model by comparing and weighing models against their own criteria. To determine the best model, users can apply the model and visualize results of the calculations to determine accuracy, or they can open and explore the related datasets that Oracle Analytics used the model to output.
Consult this table to learn about the provided algorithms:
| Name | Type | Category | Function | Description |
|---|---|---|---|---|
| CART |
Classification Regression |
Binary Classifier Multi-Classifier Numerical |
- | Uses decision trees to predict both discrete and continuous values.
Use with large datasets. |
| Elastic Net Linear Regression | Regression | Numerical | ElasticNet | Advanced regression model. Provides additional information (regularization), performs variable selection, and performs linear combinations. Penalties of Lasso and Ridge regression methods.
Use with a large number of attributes to avoid collinearity (where multiple attributes are perfectly correlated) and overfitting. |
| Hierarchical | Clustering | Clustering | AgglomerativeClustering | Builds a hierarchy of clustering using either bottom-up (each observation is its own cluster and then merged) or top down (all observations start as one cluster) and distance metrics.
Use when the dataset isn't large and the number of clusters isn't known beforehand. |
| K-Means | Clustering | Clustering | k-means | Iteratively partitions records into k clusters where each observation belongs to the cluster with the nearest mean.
Use for clustering metric columns and with a set expectation of number of clusters needed. Works well with large datasets. Result are different with each run. |
| Linear Regression | Regression | Numerical | Ordinary Least Squares
Ridge Lasso |
Linear approach for a modeling relationship between target variable and other attributes in the dataset.
Use to predict numeric values when the attributes aren't perfectly correlated. |
| Logistic Regression | Regression | Binary Classifier | LogisticRegressionCV | Use to predict the value of a categorically dependent variable. The dependent variable is a binary variable that contains data coded to 1 or 0. |
| Naive Bayes | Classification |
Binary Classifier Multi-Classifier |
GaussianNB | Probabilistic classification based on Bayes' theorem that assumes no dependence between features.
Use when there are a high number of input dimensions. |
| Neural Network | Classification |
Binary Classifier Multi-Classifier |
MLPClassifier | Iterative classification algorithm that learns by comparing its classification result with the actual value and returns it to the network to modify the algorithm for further iterations.
Use for text analysis. |
| Random Forest | Classification |
Binary Classifier Multi-Classifier Numerical |
- | An ensemble learning method that constructs multiple decision trees and outputs the value that collectively represents all the decision trees.
Use to predict numeric and categorical variables. |
| SVM | Classification |
Binary Classifier Multi-Classifier |
LinearSVC, SVC | Classifies records by mapping them in space and constructing hyperplanes that can be used for classification. New records (scoring data) are mapped into the space and are predicted to belong to a category, which is based on the side of the hyperplane where they fall. |