Before you Begin

This tutorial shows you how to compare machine learning predictive models in order to pick the most accurate model to use in Oracle Analytics.

Background

You can use Lift and Gain charts to assess the accuracy to predictive models in Oracle Analytics. The model lift and gain calculations work with machine learning classification models. Lift and gain charts show you visually how much the model is better than random guessing (the baseline). The greater the area between the lift and gain curves with the baseline, the better the model is at predicting the desired result.

In this tutorial, you use sample bank marketing campaign data as the dataset to predict which customers are likely to purchase additional services. After training the dataset using the predictive models, using the Naive Bayes and CART algorithms, you apply the models to the dataset, and then you create lift and gain charts to compare the resulting _LIFT datasets.

What Do You Need?

  • Access to Oracle Analytics
  • Download bmc_data.xlsx to your computer

Create a Dataset

In this section, you create a workbook to learn about the dataset.

  1. Sign in to Oracle Analytics.
  2. On the Home page, click Create, and then click Dataset.
  3. In Create Dataset, click Drop data file here or click to browse. In File Upload, click bmc_data.xlsx, and then click Open.
  4. In Create Dataset Table from bmc_data.xlsx, click OK.
  5. In New Dataset, double-click bmc_data in the Join Diagram.
    Description of new_dataset_join_diagram.png follows
    Description of the illustration new_dataset_join_diagram.png
  6. In the age column, click Measure Measure icon, and then click Attribute.
  7. Select to the balance column. In balance properties, click Measure in the Treat as row, and then click Attribute.


    Description of transform_columns.png follows
    Description of the illustration transform_columns.png
  8. Click Save. In Save Dataset as, enter bmc_data in Name, and then click OK.
  9. In bmc_data, click Create Workbook.
  10. If open, close the Auto Insights panel.


    Description of auto_insights_panel.png follows
    Description of the illustration auto_insights_panel.png

Examine the Bank Marketing Campaign Data

In this section, you create a workbook to look at the results of the bank marketing campaign data.

  1. In the Data Data pane icon pane, hold down the Ctrl key, select campaign and outcome, and then drag them to the canvas.


    The no responses to the marketing campaign exceed the number of yes responses.

    Description of campaign_outcome.png follows
    Description of the illustration campaign_outcome.png
  2. Click Save. In Save Workbook, enter bmc_initial_viz_results, and then click Save. Click Go back Go back icon.

Create a Naive Bayes Prediction Model

In this section, you create a Naive Bayes prediction model using the bmc_data dataset.

  1. On the Home page, click Create, and then click Data Flow.
  2. In Add Dataset, click bmc_data, and then click Add.
  3. In Data Flow Steps, double-click Train Binary Classifier.
  4. In Select Train Two-Classification Model Script, click Naive Bayes for Classification, and then click OK.
  5. In Train Binary Classifier next to Target, click Select a column and select outcome.


    Yes is the default value required in the Positive Class in Target field.

    Description of bmc_binary_nb_df.png follows
    Description of the illustration bmc_binary_nb_df.png
  6. Click the Save Model node. In Save Model, enter bmc_binary_nb_model, and then click Save. In Save Data Flow As, enter bmc_binary_nb_df, and then click OK.
  7. Click Run Data Flow Run Data Flow.
  8. Click Go back Go back icon.

Create a CART Prediction Model

In this section, you create a CART predictive model using the bmc_data dataset.

  1. On the Home page, click Create, and then click Data Flow.
  2. In Add Dataset, click bmc_data, and then click Add.
  3. In Data Flow Steps, double-click Train Binary Classifier.
  4. In Select Train Two-Classification Model Script, click CART for model training, and then click OK.
  5. In Train Binary Classifier next to Target, click Select a column, and then select outcome.


    Yes is the value you need in the Positive Class in Target field.

  6. Click Save Model node. In Save Model, enter bmc_binary_cart_model, and then click Save. In Save Data Flow As, enter bmc_binary_cart_df, and then click OK. Click Run Data Flow Run Data Flow.
  7. Click Go back Go back icon. In the Home page, click Machine Learning to see the Naive Bayes (bmc_binary_nb_model) and CART (bmc_binary_cart_model) models.


    Description of predictive_models.png follows
    Description of the illustration predictive_models.png

Apply NB Predictive Model to Compute Lift and Gain

In this section, you apply the bmc_binary_nb_model to the bmc_data to create two datasets. One dataset represents the output from running the data flow, and the second dataset contains the lift and gain calculations. The second dataset appends LIFT to the dataset name.

  1. On the Home page, click Create, and then click Data Flow.
  2. In Add Dataset, click bmc_data, and then click Add.
  3. In Data Flow Steps, double-click Apply Model.
  4. In Select Model, click bmc_binary_nb_model, and then click Ok.
  5. Click Toggle Data Preview.
  6. In Apply Model under Parameters, select Yes in Compute lift and gain.
  7. In Target Column to compute lift, click Select a column, and then select outcome from Available data. In Positive class to compute lift, enter Yes.
  8. On the Apply Model node, click Add a step Add a step icon, and then select Save Data.
  9. In Save Model, enter bmc_nb_lg in Name.
  10. Click Save As. In Save Data Flow as, enter bmc_nb_lg_df, and then click OK.
  11. Click Run Data Flow Run Data Flow icon.

    The message Data Flow "bmc_nb_lg_df" complete appears when the data flow run was successful.

  12. Click Go back Go back icon. On the Home page, click Data to view the datasets produced from running the data flow.


    Description of nb_lift_gain_datasets.png follows
    Description of the illustration nb_lift_gain_datasets.png

Apply CART Predictive Model to Compute Lift and Gain

In this section, you update and rename the existing bmc_nb_lg_df data flow to apply the bmc_binary_cart_model predictive model, specify the parameters, and rename the dataset and data flow to create the _LIFT dataset.

  1. In the bmc_nb_lg_df data flow, click Apply Model.
  2. In Apply Model, click bmc_nb_lg_model. In Select Model, click bmc_binary_cart_model, and then click OK.
  3. Click Toggle Data Preview.
  4. In Apply Model under Parameters, select Yes in Compute lift and gain.
  5. In Target Column to compute lift, click Select a column, and then select outcome from Available data. In Positive class to compute lift, enter Yes.
  6. On the Apply Model node, click the Save Data node.
  7. In Save Dataset, enter bmc_cart_lg to replace bmc_nb_lg in Name. Click Save As. In Save Data Flow as, enter bmc_cart_lg_df, and then click OK.
  8. Click Run Data Flow Run Data Flow icon.

    The message Data Flow "bmc_nb_lg_df" complete appears when the data flow run was successful.

  9. Click Go back Go back icon. On the Home page, click Data to view the results of the lift and gain data flows.


    Description of cart_lift_gain_datasets.png follows
    Description of the illustration cart_lift_gain_datasets.png

Visualize Gain

In this section, you create a workbook to visualize the output from the _LIFT datasets.

  1. On the Home page, click Create, and then click Workbook.
  2. In Add Dataset, enter bmc in Search. Select bmc_nb_lg_LIFT, and then click Add to Workbook.
  3. In the Data pane, hold down the Ctrl key, select PopulationPercentile and CumulativeGain, right-click and select Pick Visualization, and then select Line Line viz icon.


    Description of gain_by_pop.png follows
    Description of the illustration gain_by_pop.png
  4. Select GainChartBaseline and drag it to Values (Y-axis).


    Description of gain_chart_baseline.png follows
    Description of the illustration gain_chart_baseline.png
  5. In the Data pane, select OptimalGain and drag it to Values (Y-axis).


    Description of optimal_gain.png follows
    Description of the illustration optimal_gain.png
  6. In the Data pane, click Add Add icon, and then select Add Data. In Add Data, click bmc_cart_lg_LIFT, and then click Add to Workbook.
  7. In the Data pane, under the bmc_cart_lg_LIFT dataset, select CumulativeGain and drag it to Values (Y-axis).


    Description of bn_cart_cumulgain_chart.png follows
    Description of the illustration bn_cart_cumulgain_chart.png
  8. In the Data pane, under the bmc_cart_lg_LIFT dataset, select OptimalGain and drag it to Values (Y-axis).


    The optimal gain value for the bmc_cart_lg_LIFT dataset is higher than the bmc_nb_lg_LIFT dataset. The Binary Classification Naive Bayes machine learning model is better at predicting which users are likely to respond positively compared to Binary Classification CART model in this example.

    Description of optimal_gain_cart.png follows
    Description of the illustration optimal_gain_cart.png

Visualize Lift

In this section, you create visualizations of the lift results.

  1. In Data panel under the bmc_nb_lg_LIFT dataset, select LiftValue and drag it to Values (Y-Axis).


    Description of lift_value_nb.png follows
    Description of the illustration lift_value_nb.png
  2. Under the bmc_cart_lg_LIFT dataset, select LiftValue and drag it to Values (Y-Axis).


    Description of cart_lift.png follows
    Description of the illustration cart_lift.png

Learn More