Evaluate Machine Learning Models Using Lift and Gain Charts

Lift and gain charts enable you to compare different machine learning models to determine the most accurate model.

Overview to Using Lift and Gain Charts

Lift and gain charts enable you to evaluate predictive machine learning models by charting modeling statistics in a visualization in Oracle Analytics.

When you use a data flow to apply a classification model to a dataset, Oracle Analytics enables you to compute lift and gain values. You can then visualize this data in a chart to help you assess the accuracy of predictive models and determine the best one to use.

Description of ml-lift-and-gain-analysis-2.png follows
Description of the illustration ml-lift-and-gain-analysis-2.png

Prequisites

  • Oracle Database or Oracle Autonomous Data Warehouse
  • A classification model that includes prediction probability (for example, a multi-classifier model created using the Naive Bayes training script).

    You access existing predictive models in the Machine Learning area in Oracle Analytics.

Statistics Generated For Lift and Gain Analysis

When you apply a predictive classification model to a dataset and generate lift and gain statistics, you produce a dataset named <Data flow name>_LIFT with these columns:
  • PopulationPercentile - The dataset population split into 100 equal groups.
  • CumulativeGain - The ratio of the cumulative number of positive targets up to that percentile, to the total number of positive targets. The closer the cumulative gains line is to the top-left corner of the chart, the greater the gain; the higher the proportion of the responders that are reached for the lower proportion of customers contacted.
  • GainChartBaseline - The overall response rate : the line represents the percentage of positive records we expect to get if we selected records randomly. For example, in a marketing campaign, if we contact X% of the customers randomly, we will receive X% of the total positive response.
  • LiftChartBaseline - Value of 1 and used as a baseline for lift comparison.
  • LiftValue - The cumulative lift for a percentile. Lift is the ratio of the cumulative positive records density for the selected data, to the positive density over all the test data.
  • IdealModelLine - The ratio of the cumulative number of positive targets to the total number of positive targets.
  • OptimalGain - This indicates the optimum number of customers to contact. The cumulative gain curve will flatten beyond this point.

You can then visualize the <Data flow name>_LIFT dataset in an Oracle Analytics chart. For example, to analyze gains, you might plot PopulationPercentile on the x-axis, and CumulativeGain, GainChartBaseline, IdealModelLine, and OptimalGain on the y-axis.

Description of ml-lift-and-gain-analysis-1.png follows
Description of the illustration ml-lift-and-gain-analysis-1.png

Generate Predictive Data for Lift and Gain Charts

When you use a data flow to apply a classification model to a dataset, Oracle Analytics enables you to compute statistics that you can visualize in lift and gain charts.

Before you start, create a classification model that includes prediction probability (for example, a multi-classifier model created using the Naive Bayes training script). Oracle Analytics displays available models on the Models tab on the Machine Learning page (from the Home page, click Machine Learning).
  1. On the Home page, click Create, then click Data Flow.
  2. Select a data source, then click Add.
  3. Click Add a Step, and select Apply Model.
  4. At Select Model, select a classification model that includes prediction probability, then click OK.
  5. In Apply Model, in the Parameters section:
    • In Compute lift and gain, select Yes.
    • In Target column to compute lift, select the column name of the value being predicted. For example, if your model predicts whether customers will sign up for a membership using a column named SIGNUP, select SIGNUP.
    • In Positive class to compute, specify the case-sensitive data value representing the positive class (or the preferred outcome) in the prediction. For example, if your model predicts whether customers will sign-up for a membership using a column named SIGNUP with values YES or NO, specify YES.
  6. Add a Save Data node to your data flow.
  7. Execute this data flow.
The data flow produces a dataset named <Data flow name>_LIFT that contains lift and gain statistics, which you can evaluate.

Evaluate a Machine Learning Model Using a Lift and Gain Chart

Use a chart to analyze statistics generated by machine learning classification models to determine the best model to use.

Before you start, apply a predictive model to your data and generate lift and gain statistics in a dataset.
  1. On the Home page, click Create, then click Workbook.
  2. In Add Dataset, select the <Data flow name>_LIFT dataset that you generated in the previous task, then click Add to Workbook.
  3. In the Visualize panel, select the statistics to analyze, then right-click and select Pick Visualization, and choose Line Chart.
    For example, to analyze gains, you might place PopulationPercentile on the x-axis, and place CumulativeGain, GainChartBaseline, IdealModelLine, and OptimalGain on the y-axis.
    To analyze lift, you might place PopulationPercentile on the x-axis, and place LiftChartBaseline, and LiftValue on the y-axis.