What Are Oracle Analytics Predictive Models?

An Oracle Analytics predictive model applies a specific algorithm to a dataset to predict values, predict classes, or to identify groups in the data.

You can also use Oracle machine learning models to predict data.

Oracle Analytics includes algorithms to help you train predictive models for various purposes. Examples of algorithms are classification and regression trees (CART), logistic regression, and k-means.

You use the data flow editor to first train a model on a training dataset. After the predictive model has been trained, you apply it to the datasets that you want to predict.

You can make a trained model available to other users who can apply it against their data to predict values. In some cases, certain users train models, and other users apply the models.

Note:

If you're not sure what to look for in your data, you can start by using Explain, which uses machine learning to identify trends and patterns. Then you can use the data flow editor to create and train predictive models to drill into the trends and patterns that Explain found.

You use the data flow editor to train a model:

First, you create a data flow and add the dataset that you want to use to train the model. This training dataset contains the data that you want to predict (for example, a value like sales or age, or a variable like credit risk bucket).
If needed, you can use the data flow editor to edit the dataset by adding columns, selecting columns, joining, and so on.
After you've confirmed that the data is what you want to train the model on, you add a training step to the data flow and choose a classification (binary or multi), regression, or cluster algorithm to train a model. Then name the resulting model, save the data flow, and run it to train and create the model.
Examine the properties in the machine learning objects to determine the quality of the model. If needed, you can iterate the training process until the model reaches the quality you want.

Use the finished model to score unknown, or unlabeled, data to generate a dataset within a data flow or to add a prediction visualization to a workbook.

Example

Suppose you want to create and train a multi-classification model to predict which patients have a high risk of developing heart disease.

Supply a training dataset containing attributes on individual patients like age, gender, and if they've ever experienced chest pain, and metrics like blood pressure, fasting blood sugar, cholesterol, and maximum heart rate. The training dataset also contains a column named "Likelihood" that is assigned one of the following values: absent, less likely, likely, highly likely, or present.
Choose the CART (Decision Tree) algorithm because it ignores redundant columns that don't add value for prediction, and identifies and uses only the columns that are helpful to predict the target. When you add the algorithm to the data flow, you choose the Likelihood column to train the model. The algorithm uses machine learning to choose the driver columns that it needs to perform and output predictions and related datasets.
Inspect the results and fine tune the training model, and then apply the model to a larger dataset to predict which patients have a high probability of having or developing heart disease.