An Oracle Analytics predictive model applies a specific algorithm to a data set to predict values, predict classes, or to identify groups in the data.
Oracle Analytics includes algorithms to help you train predictive models for various purposes. Examples of algorithms are classification and regression trees (CART), logistic regression, and k-means.
You use the data flow editor to first train a model on a training data set. After the predictive model has been trained, you apply it to the data sets that you want to predict.
You can make a trained model available to other users who can apply it against their data to predict values. In some cases, certain users train models, and other users apply the models.
Note:If you're not sure what to look for in your data, you can start by using Explain, which uses machine learning to identify trends and patterns. Then you can use the data flow editor to create and train predictive models to drill into the trends and patterns that Explain found. See What is Explain?
- First, you create a data flow and add the data set that you want to use to train the model. This training data set contains the data that you want to predict (for example, a value like sales or age, or a variable like credit risk bucket).
- If needed, you can use the data flow editor to edit the data set by adding columns, selecting columns, joining, and so on.
- After you've confirmed that the data is what you want to train the model on, you add a training step to the data flow and choose a classification (binary or multi), regression, or cluster algorithm to train a model. Then name the resulting model, save the data flow, and run it to train and create the model.
- Examine the properties in the machine learning objects to determine the quality of the model. If needed, you can iterate the training process until the model reaches the quality you want.
Use the finished model to score unknown, or unlabeled, data to generate a data set within a data flow or to add a prediction visualization to a project.
Suppose you want to create and train a multi-classification model to predict which patients have a high risk of developing heart disease.
- Supply a training data set containing attributes on individual patients like age, gender, and if they've ever experienced chest pain, and metrics like blood pressure, fasting blood sugar, cholesterol, and maximum heart rate. The training data set also contains a column named "Likelihood" that is assigned one of the following values: absent, less likely, likely, highly likely, or present.
- Choose the CART (Decision Tree) algorithm because it ignores redundant columns that don't add value for prediction, and identifies and uses only the columns that are helpful to predict the target. When you add the algorithm to the data flow, you choose the Likelihood column to train the model. The algorithm uses machine learning to choose the driver columns that it needs to perform and output predictions and related data sets.
- Inspect the results and fine tune the training model, and then apply the model to a larger data set to predict which patients have a high probability of having or developing heart disease.