Create and Use Oracle Analytics Predictive Models

Oracle Analytics predictive models use several embedded machine learning algorithms to mine your data sets, predict a target value, or identify classes of records. Use the data flow editor to create, train, and apply predictive models to your data.

What Are Oracle Analytics Predictive Models?

An Oracle Analytics predictive model applies a specific algorithm to a data set to predict values, predict classes, or to identify groups in the data.

You can also use Oracle machine learning models to predict data. See

How Can I Use Oracle Machine Learning Models in Oracle Analytics?

Oracle Analytics includes algorithms to help you train predictive models for various purposes. Examples of algorithms are classification and regression trees (CART), logistic regression, and k-means.

You use the data flow editor to first train a model on a training data set. After the predictive model has been trained, you apply it to the data sets that you want to predict.

You can make a trained model available to other users who can apply it against their data to predict values. In some cases, certain users train models, and other users apply the models.

Note:

If you're not sure what to look for in your data, you can start by using Explain, which uses machine learning to identify trends and patterns. Then you can use the data flow editor to create and train predictive models to drill into the trends and patterns that Explain found. See What is Explain?
You use the data flow editor to train a model:
  • First, you create a data flow and add the data set that you want to use to train the model. This training data set contains the data that you want to predict (for example, a value like sales or age, or a variable like credit risk bucket).
  • If needed, you can use the data flow editor to edit the data set by adding columns, selecting columns, joining, and so on.
  • After you've confirmed that the data is what you want to train the model on, you add a training step to the data flow and choose a classification (binary or multi), regression, or cluster algorithm to train a model. Then name the resulting model, save the data flow, and run it to train and create the model.
  • Examine the properties in the machine learning objects to determine the quality of the model. If needed, you can iterate the training process until the model reaches the quality you want.

Use the finished model to score unknown, or unlabeled, data to generate a data set within a data flow or to add a prediction visualization to a project.

Example

Suppose you want to create and train a multi-classification model to predict which patients have a high risk of developing heart disease.

  1. Supply a training data set containing attributes on individual patients like age, gender, and if they've ever experienced chest pain, and metrics like blood pressure, fasting blood sugar, cholesterol, and maximum heart rate. The training data set also contains a column named "Likelihood" that is assigned one of the following values: absent, less likely, likely, highly likely, or present.
  2. Choose the CART (Decision Tree) algorithm because it ignores redundant columns that don't add value for prediction, and identifies and uses only the columns that are helpful to predict the target. When you add the algorithm to the data flow, you choose the Likelihood column to train the model. The algorithm uses machine learning to choose the driver columns that it needs to perform and output predictions and related data sets.
  3. Inspect the results and fine tune the training model, and then apply the model to a larger data set to predict which patients have a high probability of having or developing heart disease.

How Do I Choose a Training Model Algorithm?

Oracle Analytics provides algorithms for any of your machine learning modeling needs: numeric prediction, multi-classifier, binary classifier, and clustering.

Oracle's machine learning functionality is for advanced data analysts who have an idea of what they're looking for in their data, are familiar with the practice of predictive analytics, and understand the differences between algorithms.

Normally users want to create multiple prediction models, compare them, and choose the one that's most likely to give results that satisfy their criteria and requirements. These criteria can vary. For example, sometimes users choose models that have better overall accuracy, sometimes users choose models that have the least type I (false positive) and type II (false negative) errors, and sometimes users choose models that return results faster and with an acceptable level of accuracy even if the results aren't ideal.

Oracle Analytics contains multiple machine learning algorithms for each kind of prediction or classification. With these algorithms, users can create more than one model, or use different fine-tuned parameters, or use different input training datasets and then choose the best model. The user can choose the best model by comparing and weighing models against their own criteria. To determine the best model, users can apply the model and visualize results of the calculations to determine accuracy, or they can open and explore the related data sets that Oracle Analytics used the model to output. See What Are Related Data Sets?

Consult this table to learn about the provided algorithms:

Name Type Category Function Description
CART

Classification

Regression

Binary Classifier

Multi-Classifier

Numerical

- Uses decision trees to predict both discrete and continuous values.

Use with large data sets.

Elastic Net Linear Regression Regression Numerical ElasticNet Advanced regression model. Provides additional information (regularization), performs variable selection, and performs linear combinations. Penalties of Lasso and Ridge regression methods.

Use with a large number of attributes to avoid collinearity (where multiple attributes are perfectly correlated) and overfitting.

Hierarchical Clustering Clustering AgglomerativeClustering Builds a hierarchy of clustering using either bottom-up (each observation is its own cluster and then merged) or top down (all observations start as one cluster) and distance metrics.

Use when the data set isn't large and the number of clusters isn't known beforehand.

K-Means Clustering Clustering k-means Iteratively partitions records into k clusters where each observation belongs to the cluster with the nearest mean.

Use for clustering metric columns and with a set expectation of number of clusters needed. Works well with large datasets. Result are different with each run.

Linear Regression Regression Numerical Ordinary Least Squares

Ridge

Lasso

Linear approach for a modeling relationship between target variable and other attributes in the data set.

Use to predict numeric values when the attributes aren't perfectly correlated.

Logistic Regression Regression Binary Classifier LogisticRegressionCV Use to predict the value of a categorically dependent variable. The dependent variable is a binary variable that contains data coded to 1 or 0.
Naive Bayes Classification

Binary Classifier

Multi-Classifier

GaussianNB Probabilistic classification based on Bayes' theorem that assumes no dependence between features.

Use when there are a high number of input dimensions.

Neural Network Classification

Binary Classifier

Multi-Classifier

MLPClassifier Iterative classification algorithm that learns by comparing its classification result with the actual value and returns it to the network to modify the algorithm for further iterations.

Use for text analysis.

Random Forest Classification

Binary Classifier

Multi-Classifier

Numerical

- An ensemble learning method that constructs multiple decision trees and outputs the value that collectively represents all the decision trees.

Use to predict numeric and categorical variables.

SVM Classification

Binary Classifier

Multi-Classifier

LinearSVC, SVC Classifies records by mapping them in space and constructing hyperplanes that can be used for classification. New records (scoring data) are mapped into the space and are predicted to belong to a category, which is based on the side of the hyperplane where they fall.

Typical Workflow to Create and Use Oracle Analytics Predictive Models

Here are the common tasks for creating predictive models, and how to apply the models to data sets and use them in projects.

Task Description More Information
Train a model using sample data Use one of the supplied algorithms to train a model to predict trends and patterns in your sample data. Create and Train a Predictive Model
Evaluate a model Use related data sets to evaluate the effectiveness of your model, and iteratively refine the model until you're satisfied with it. Inspect a Training Model
Apply a model to your data using a data flow Apply a training predictive model to your data to generate a data set that includes the predicted trends and patterns. Apply a Predictive or Oracle Machine Learning Model to a Data Set
Apply a predictive model to your project data Use a scenario to add a predictive model to your project. Add a Predictive Model to a Project

Create and Train a Predictive Model

Based on the problem that needs to be solved, an advanced data analyst chooses an appropriate algorithm to train a predictive model and then evaluates the model's results.

Arriving at an accurate model is an iterative process and an advanced data analyst can try different models, compare their results, and fine tune parameters based on trial and error. A data analyst can use the finalized, accurate predictive model to predict trends in other data sets, or add the model to projects.

Oracle Analytics provides algorithms for numeric prediction, multi-classification, binary-classification and clustering. For information about how to choose an algorithm, see How Do I Choose a Training Model Algorithm?

  1. In the Home page, click Create and select Data Flow.
  2. Select the data set that you want to use to train the model. Click Add.
    Typically you'll select a data set that was prepared specifically for training the model and contains a sample of the data that you want to predict. The accuracy of a model depends on how representative the training data is.
  3. In the data flow editor, click Add a step (+).
    After adding a data set, you can either use all columns in the data set to build the model or select only the relevant columns. Choosing the relevant columns requires an understanding of the data set. Ignore columns that you know won't influence the outcome behavior or that contain redundant information. You can choose only relevant columns by adding the Select Columns step. If you're not sure about the relevant columns, then use all columns.
  4. Navigate to the bottom of the list and click the train model type that you want to apply to the data set.
  5. Select an algorithm and click OK.
  6. If you're working with a supervised model like prediction or classification, then click Target and select the column that you're trying to predict. For example, if you're creating a model to predict a person's income, then select the Income column.
    If you're working with an unsupervised model like clustering, then no target column is required.
  7. Change the default settings for your model to fine tune and improve the accuracy of the predicted outcome. The model you're working with determines these settings.
  8. Click the Save Model step and provide a name and description. This will be the name of the generated predictive model.
  9. Click Save, enter a name and description of the data flow, and click OK to save the data flow.
  10. Click Run Data Flow to create the predictive model based on the input data set and model settings that you provided.