12 Testing and Tuning Models
Testing a model enables you to estimate how accurate the predictions of a model are. You can test Classification models and Regression models, and tune Classification models.
This section contains the following topics:
- Testing Classification Models
Classification models are tested by comparing the predicted values to known target values in a set of test data. - Tuning Classification Models
When you tune a model, you create a derived cost matrix to use for subsequent Test and Apply operations. - Testing Regression Models
Regression models are tested by comparing the predicted values to known target values in a set of test data.
Testing Classification Models
Classification models are tested by comparing the predicted values to known target values in a set of test data.
The historical data for a Classification project is typically divided into two data sets:
-
One for building the model
-
One for testing the model
The test data must be compatible with the data used to build the model and must be prepared in the same way that the build data was prepared.
These are the ways to test Classification and Regression models:
-
By splitting the input data into build data and test data. This is the default. The test data is created by randomly splitting the build data into two subsets. 40 percent of the input data is used for test data.
-
By using all the build data as test data.
-
By attaching two Data Source nodes to the build node.
-
The first data source that you connect to the build node is the source of the build data.
-
The second node that you connect is the source of the test data.
-
-
By deselecting Perform Test in the Test section of the Properties pane and using a Test node. The Test section define how tests are done. By default, all Classification and Regression models are tested.
Oracle Data Miner provides test metrics for Classification models so that you can evaluate the model.
After testing, you can tune the models.
- Test Metrics for Classification Models
Test metrics assess how accurately the model predicts the known values. - Compare Classification Test Results
By using the Compare Test Result context menu option in a Test node and Classification node, you can compare test results of a Classification model that are tested in a Test node, and for models that are tested after running the Classification node respectively. - Classification Model Test Viewer
The Classification Model Test viewer displays all information related to the Classification Model test results. - Viewing Test Results
You can view results of models that are tested in a Classification node and a Test node.
Related Topics
Parent topic: Testing and Tuning Models
Test Metrics for Classification Models
Test metrics assess how accurately the model predicts the known values.
Test settings specify the metrics to be calculated and control the calculation of the metrics. By default, Oracle Data Miner calculates the following metrics for Classification models:
- Performance
The performance measures that are calculated are Predictive Confidence, Average Accuracy, Overall Accuracy, Cost and Cost. - Performance Matrix
A Performance Matrix displays the number of correct and incorrect predictions made by the model compared with the actual classifications in the test data. - Receiver Operating Characteristics (ROC)
Receiver Operating Characteristics (ROC) analysis is a useful method for evaluating Classification models. ROC applies to binary classification only. - Lift
Lift measures the degree to which the predictions of a Classification model are better than randomly-generated predictions. Lift applies to binary classification and non-binary classifications. - Profit and ROI
Profit uses user-supplied values for startup cost, incremental revenue, incremental cost, budget, and population to maximize the profit.
Related Topics
Parent topic: Testing Classification Models
Performance
The performance measures that are calculated are Predictive Confidence, Average Accuracy, Overall Accuracy, Cost and Cost.
You can view these values separately, and also view all of them at the same time. To view the performance measures:
- Predictive Confidence
Predictive Confidence provides an estimate of how accurate the model is. Predictive Confidence is a number between 0 and 1. - Average Accuracy
Average Accuracy refers to the percentage of correct predictions made by the model when compared with the actual classifications in the test data. - Overall Accuracy
Overall Accuracy refers to the percentage of correct predictions made by the model when compared with the actual classifications in the test data. - Cost
In a Classification model, it is important to specify the costs involved in making an incorrect decision. By doing so, it can be useful when the costs of different misclassifications vary significantly.
Parent topic: Test Metrics for Classification Models
Predictive Confidence
Predictive Confidence provides an estimate of how accurate the model is. Predictive Confidence is a number between 0 and 1.
Oracle Data Miner displays Predictive Confidence as a percentage. For example, the Predictive Confidence of 59 means that the Predictive Confidence is 59 percent (0.59).
Predictive Confidence indicates how much better the predictions made by the tested model are than predictions made by a naive model. The Naive Bayes model always predicts the mean for numerical targets and the mode for categorical targets.
Predictive Confidence is defined by the following formula:
Predictive Confidence = MAX[(1-Error of model/Error of Naive Model),0]X100
Where:
Error of Model is (1 - Average Accuracy/100)
Error of Naive Model is (Number of target classes - 1) / Number of target classes
-
If the Predictive Confidence is
0
,
then it indicates that the predictions of the model are no better than the predictions made by using the naive model. -
If the Predictive Confidence is
1,
then it indicates that the predictions are perfect. -
If the Predictive Confidence is
0.5,
then it indicates that the model has reduced the error of a naive model by 50 percent.
Parent topic: Performance
Average Accuracy
Average Accuracy refers to the percentage of correct predictions made by the model when compared with the actual classifications in the test data.
The formula to calculate the Average Accuracy is:
Average Accuracy = (TP/(TP+FP)+TN/(FN+TN))/Number of classes*100
Where:
-
TP is True Positive.
-
TN is True Negative.
-
FP is False Positive.
-
FN is False Negative.
The average per-class accuracy achieved at a specific probability threshold is greater than the accuracy achieved at all other possible thresholds.
Parent topic: Performance
Overall Accuracy
Overall Accuracy refers to the percentage of correct predictions made by the model when compared with the actual classifications in the test data.
The formula to calculate the Overall Accuracy is:
Overall Accuracy = (TP+TN)/(TP+FP+FN+TN)*100
Where:
-
TP is True Positive.
-
TN is True Negative.
-
FP is False Positive.
-
FN is False Negative.
Parent topic: Performance
Cost
In a Classification model, it is important to specify the costs involved in making an incorrect decision. By doing so, it can be useful when the costs of different misclassifications vary significantly.
For example, suppose the problem is to predict whether a user is likely to respond to a promotional mailing. The target has two categories: YES (the customer responds) and NO (the customer does not respond). Suppose a positive response to the promotion generates $500 and that it costs $5 to do the mailing. Then, the scenarios are:
-
If the model predicts YES, and the actual value is YES, then the cost of misclassification is $0.
-
If the model predicts YES, and the actual value is NO, then the cost of misclassification is $5.
-
If the model predicts NO, and the actual value is YES, then the cost of misclassification is $500.
-
If the model predicts NO, and the actual value is NO, then the cost of misclassification is $0.
Algorithms for Classification model use cost matrix during scoring to propose the least expensive solution. If you do not specify a cost matrix, then all misclassifications are counted as equally important.
If you are building an SVM model, then you must specify costs using model weights instead of a cost matrix.
Parent topic: Performance
Performance Matrix
A Performance Matrix displays the number of correct and incorrect predictions made by the model compared with the actual classifications in the test data.
Performance Matrix is calculated by applying the model to a hold-out sample (the test set, created during the split step in a classification activity) taken from the build data. The values of the target are known. The known values are compared with the values predicted by the model. Performance Matrix does the following:
-
Measures the likelihood of the model to predict incorrect and correct values
-
Indicates the types of errors that the model is likely to make
The columns are predicted values and the rows are actual values. For example, if you are predicting a target with values 0 and 1, then the number in the upper right cell of the matrix indicates the false-positive predictions, that is, predictions of 1 when the actual value is 0.
Parent topic: Test Metrics for Classification Models
Receiver Operating Characteristics (ROC)
Receiver Operating Characteristics (ROC) analysis is a useful method for evaluating Classification models. ROC applies to binary classification only.
ROC is plotted as a curve. The area under the ROC curve measures the discriminating ability of a binary Classification model. The correct value for the ROC threshold depends on the problem that the model is trying to solve.
ROC curves are similar to lift charts in that they provide a means of comparison between individual models and determine thresholds that yield a high proportion of positive results. An ROC curve does the following:
-
Provides a means to compare individual models and determine thresholds that yield a high proportion of positive results.
-
Provides insight into the decision-making ability of the model. For example, you can determine how likely the model is to accurately predict the negative or the positive class.
-
Compares predicted and actual target values in a Classification model.
- How to Use ROC
Receiver Operating Characteristics (ROC) supports what-if analysis.
Parent topic: Test Metrics for Classification Models
How to Use ROC
Receiver Operating Characteristics (ROC) supports what-if analysis.
You can use ROC to experiment with modified model settings to observe the effect on the Performance Matrix. For example, assume that a business problem requires that the false-negative value be reduced as much as possible within the confines of the requirement that the number of positive predictions be less than or equal to some fixed number. You might offer an incentive to each customer predicted to be high-value, but you are constrained by a budget with a maximum of 170 incentives. On the other hand, the false negatives represent missed opportunities, so you want to avoid such mistakes.
To view the changes in the Performance Matrix:
- Click Edit Custom Operating Point at the upper right corner. The Specify Custom Threshold dialog box opens.
- In the Specify Custom Threshold dialog box, mention the desired settings, and view the changes in the Custom Accuracy field.
As you change the Performance Matrix, you are changing the probability that result in a positive prediction. Typically, the probability assigned to each case is examined and if the probability is 0.5 or higher, then a positive prediction is made. Changing the cost matrix changes the positive prediction threshold to some value other than 0.5, and the changed value is displayed in the first column of the table beneath the graph.
Parent topic: Receiver Operating Characteristics (ROC)
Lift
Lift measures the degree to which the predictions of a Classification model are better than randomly-generated predictions. Lift applies to binary classification and non-binary classifications.
Lift measures how rapidly the model finds the actual positive target values. For example, lift enables you to figure how much of the customer database you must contact to get 50 percent of the customers likely to respond to an offer.
The x-axis of the graph is divided into quantiles. To view exact values, place the cursor over the graph. Below the graph, you can select the quantile of interest using Selected Quantile. The default quantile is quantile 1.
To calculate lift, Oracle Data Mining does the following:
-
Applies the model to test data to gather predicted and actual target values. This is the same data used to calculate the Performance Matrix.
-
Sorts the predicted results by probability, that is, the confidence in a positive prediction.
-
Divides the ranked list into equal parts, quantiles. The default is
100.
-
Counts the actual positive values in each quantile.
You can graph the lift as either Cumulative Lift or as Cumulative Positive Cases (default). To change the graph, select the appropriate value from the Display list. You can also select a target value in the Target Value list.
Parent topic: Test Metrics for Classification Models
Profit and ROI
Profit uses user-supplied values for startup cost, incremental revenue, incremental cost, budget, and population to maximize the profit.
Oracle Data Miner calculates profit as follows:
Profit = -1 * Startup Cost + (Incremental Revenue * Targets Cumulative - Incremental Cost * (Targets Cumulative + Non Targets Cumulative)) * Population / Total Targets
Profit can be positive or negative, that is, it can be a loss.
To view the profit predicted by this model, select the Target Value that you are interested in. You can change the Selected Population%. The default is 1 percent.
Return on Investment (ROI) is the ratio of money gained or lost (whether realized or unrealized) on an investment relative to the amount of money invested. Oracle Data Mining uses this formula:
ROI = ((profit - cost) / cost) * 100 where profit = Incremental Revenue * Targets Cumulative, cost = Incremental Cost * (Targets Cumulative + Non Targets Cumulative)
- Profit and ROI Example
The Profit and ROI example illustrates how profit and ROI are calculated. - Profit and ROI Use Case
The Profit and ROI Use Case depicts how to interpret results for profit and ROI calculations.
Parent topic: Test Metrics for Classification Models
Profit and ROI Example
The Profit and ROI example illustrates how profit and ROI are calculated.
To calculate profit:
To calculate ROI, use the formula
ROI = ((profit - cost) / cost) * 100 profit = Incremental Revenue * Targets Cumulative, cost = Incremental Cost * (Targets Cumulative + Non Targets Cumulative)
Substituting the values in this example results in
ROI = ((180 - 100) / 100) * 100 = 80
Parent topic: Profit and ROI
Profit and ROI Use Case
The Profit and ROI Use Case depicts how to interpret results for profit and ROI calculations.
Suppose you run a mail order campaign. You will mail each customer a catalog. You want to mail catalogs to those customers who are likely to purchase things from the catalog.
Here is the input data from Profit and ROI example:
-
Startup cost = 1000. This is the total cost to start the campaign.
-
Incremental revenue = 10. This is estimated revenue that results from a sale or new customer.
-
Budget = 10000. This is the total amount of money that you can spend.
-
Population = 2000. This is the total number of cases.
Therefore, each quantile contains 20 cases:
total population /number of quantiles = 2000/100 = 20
The cost to promote a sale in each quantile is (Incremental Cost * number of cases per quantile) = $5 * 20 = $100).
The cumulative costs per quantile are as follows:
-
Quantile 1 costs $1000 (startup cost) + $100 (cost to promote a sale in Quantile 1) = $1100.
-
Quantile 2 costs $1100 (cost of Quantile 1) + $100 (cost in Quantile 2).
-
Quantile 3 costs $1200.
If you calculate all of the intermediate values, then the cumulative costs for Quantile 90 is $10,000 and for Quantile 100 is $11,000. The budget is $10,000. If you look at the graph for profit in Oracle Data Miner, then you should see the budget line drawn in the profit chart on the 90th quantile.
In the Profit and ROI example, the calculated profit is $600 and ROI is 80 percent, which means that if you mail catalogs to first 20 quantiles of the population (400), then the campaign will generate a profit of $600 (which has ROI of 80 percent).
If you randomly mail the catalogs to first 20 quantiles of customers, then the profit is
Profit = -1 * Startup Cost + (Incremental Revenue * Targets Cumulative - Incremental Cost * (Targets Cumulative + Non Targets Cumulative)) * Population / Total Targets Profit = -1 * 1000 + (10 * 10 - 5 * (10 + 10)) * 2000 / 100 = -$1000
In other words, there is no profit.
Related Topics
Parent topic: Profit and ROI
Compare Classification Test Results
By using the Compare Test Result context menu option in a Test node and Classification node, you can compare test results of a Classification model that are tested in a Test node, and for models that are tested after running the Classification node respectively.
To compare test results for all of the models in a Classification Build node:
-
If you tested the models when you ran the Classification node: Right-click the Classification node that contains the models and select Compare Test Results.
-
If you tested the Classification models in a Test node: Right-click the Test node that tests the models and select Compare Test Results.
The Classification Model Test viewer that compares the test results, opens. The comparison enables you to select the model that best solves a business problem.
The graphs in the Performance tab for different models are in different colors. In the other tabs, the same color is used for the line indicating measures such as lift.
The color associated with each model is displayed in the bottom page of each tab.
- Compare Test Results
The comparison of Test results for a Classification node are displayed under different categories for performance, performance Matrix, ROC, lift, and profit.
Related Topics
Parent topic: Testing Classification Models
Compare Test Results
The comparison of Test results for a Classification node are displayed under different categories for performance, performance Matrix, ROC, lift, and profit.
Compare Test Results for Classification are displayed in these tabs:
-
Performance: Compares performance results in the top pane for the models listed in the bottom panel.
To edit the list of models, click above pane that lists the models. This opens the Edit Test Selection (Classification and Regression) dialog box. By default, test results for all models are compared.
-
Performance Matrix: Displays the Performance Matrix for each model. You can display either Compare models (a comparison of the performance matrices) or Details (the Performance Matrix for a selected model).
-
ROC: Compares the ROC curves for the models listed in the lower pane.
To see information for a curve, select a model and click.
To edit the list of models, click above pane that lists the models. This opens the Edit Test Selection (Classification and Regression) dialog box.
-
Lift: Compares the lift for the models listed in the lower pane. For more information about lift, see Lift.
To edit the list of models, click above pane that lists the models. This opens the Edit Test Selection (Classification and Regression) dialog box.
-
Profit: Compares the profit curves for the models listed in the lower pane.
To edit the list of models, click above pane that lists the models. This opens the Edit Test Selection (Classification and Regression) dialog box.
- Edit Test Selection (Classification and Regression)
By default, test results for all successfully built models in the build node are selected.
Related Topics
Parent topic: Compare Classification Test Results
Edit Test Selection (Classification and Regression)
By default, test results for all successfully built models in the build node are selected.
If you do not want to view test results for a model, then deselect the model. Click OK when you have finished.
Parent topic: Compare Test Results
Classification Model Test Viewer
The Classification Model Test viewer displays all information related to the Classification Model test results.
Open the test viewer by selecting either View Test Results or Compare Test Results in the context menu for a Classification node or a Test node that tests Classification models. Select the results to view.
-
Models: (Default)
-
Partitions:
-
If a partition has never been selected, then the Select Partition dialog box opens.
-
If a partition has been previously selected, then it will be loaded. Click the Partition name that is displayed in the Search field, to view the details.
-
To change the selected partition, click . This opens the Select Partition dialog box.
-
The Classification model test viewer shows the following tabs:
- Performance
The Performance tab provides an overall summary of the performance of each model generated. - Performance Matrix
The Performance Matrix displays the number of correct and incorrect predictions made by the model compared with the actual classifications in the test data. - ROC
Receiver Operating Characteristics (ROC) compares predicted and actual target values in a binary Classification model. - Lift
The Lift graph shows the lift from the model (or models) and also shows the lift from a naive model (Random) and the ideal lift. - Profit
The Profit graph displays information related to profit, budget, and threshold for one or more models. - Model Partitions
The Model Partition tab displays the information of the model partitions on a node. The number of partitions can be very large, a fetch size limit will be added.
Related Topics
Parent topic: Testing Classification Models
Performance
The Performance tab provides an overall summary of the performance of each model generated.
It displays test results for several common test metrics:
-
All Measures (default). The Measure list enables you to select the measures to display. By default, all measures are displayed. The selected measures are displayed as graphs. If you are comparing test results for two or more models, then different models have graphs in different colors.
-
Predictive Confidence
-
Average Accuracy
-
Overall Accuracy
-
Cost, if you specified costs or the system calculated costs
In the Sort By fields. you can specify the sort attribute and sort order. The first list is the sort attribute: measure, creation date, or name (the default). The second list is the sort order: ascending or descending (default).
Below the graphs, the Models table supplements the information presented in the graph. You can minimize the table using the splitter line. The Models table in the lower panel summarizes the data in the histograms:
-
Name, the name of the model along with color of the model in the graphs
-
Predictive Confidence percent
-
Overall Accuracy percent
-
Average Accuracy percent
-
Cost, if you specified cost (costs are calculated by Oracle Data Miner for decision trees)
-
Algorithm (used to build the model)
-
Build Rows
-
Test Rows
-
Creation date
By default, results for the selected model are displayed. To change the list of models, click and deselect any models for which you do not want to see results. If you deselect a model, then both the histogram and the summary information are removed.
To view the model, select
Related Topics
Parent topic: Classification Model Test Viewer
Performance Matrix
The Performance Matrix displays the number of correct and incorrect predictions made by the model compared with the actual classifications in the test data.
You can either view the detail for a selected model, or you can compare performance matrices for all models.
-
Click Show Details to view test results for one model.
-
Click Compare Nodes to compare test results.
- Show Detail
The Show Detail view displays all information related to the selected model. - Compare Models
Compare Models compares performance information for all models in the node that were tested.
Parent topic: Classification Model Test Viewer
Show Detail
The Show Detail view displays all information related to the selected model.
First, select a model. If you are viewing test results for one model, then the details for that model are displayed automatically.
-
In the top pane, Average Accuracy and Overall Accuracy are displayed with a grid that displays the correct predictions for each target value. Cost information is displayed if you have specified costs.
-
In the bottom pane, a Performance Matrix with rows showing actual values and columns showing predicted values is displayed for the selected model. The percentage correct and cost are displayed for each column.
Select Show totals and cost to see the total, the percentage correct, and cost for correct and incorrect predictions.
Click to filter your search based on a target.
Parent topic: Performance Matrix
Compare Models
Compare Models compares performance information for all models in the node that were tested.
-
The top pane lists the following for each model:
-
Percentage of correct predictions
-
Count of correct predictions
-
Total case count
-
Cost information
To see more detail, select a model and click .
-
-
The bottom pane displays the target value details for the model selected in the top pane. Select the measure. To filter your search by target value, click
-
Correct Predictions (default): Displays correct predictions for each value of the target attribute
-
Costs: Displays costs for each value of the target
-
Parent topic: Performance Matrix
ROC
Receiver Operating Characteristics (ROC) compares predicted and actual target values in a binary Classification model.
To edit and view an ROC:
- Edit Test Result Selection
In the Edit test Result Selection dialog box, you can select specific models that you want to compare. - ROC Detail Dialog
The ROC Detail Dialog displays statistics for probability thresholds.
Related Topics
Parent topic: Classification Model Test Viewer
Edit Test Result Selection
In the Edit test Result Selection dialog box, you can select specific models that you want to compare.
By default, all models are selected in the Edit test Result Selection dialog box. Deselect the check box for those models for which you do not want to see results. If you deselect a model, then both the ROC curve and the details for that model are not displayed.
Click OK when you have finished.
Parent topic: ROC
ROC Detail Dialog
The ROC Detail Dialog displays statistics for probability thresholds.
For each probability threshold, the following are displayed:
-
True Positive
-
False Negative
-
False Positive
-
True Negative
-
True Positive Fraction
-
False Positive Fraction
-
Overall Accuracy
-
Average Accuracy
Click OK to dismiss the dialog box.
Parent topic: ROC
Lift
The Lift graph shows the lift from the model (or models) and also shows the lift from a naive model (Random) and the ideal lift.
The x-axis of the graph is divided into quantiles. The lift graph displays at least three lines:
-
A line showing the lift for each model
-
A red line for the random model
-
A vertical blue line for threshold
The Lift viewer compares lift results for a given target value in two or more models. It displays either the Cumulative Positive Cases or the Cumulative Lift.
If you are comparing the lift for two or more models, then the lines for different models are in different colors. The table below the graph shows the name of the model and the color used to display results for that model.
The viewer has the following controls:
-
Display: Selects the display option, either Cumulative Positive Cases (default) or Cumulative Lift.
-
Target Value: Selects the target value for comparison. The default target value is the least frequently occurring target value.
The threshold is a blue vertical line used to select a quantile. As the threshold moves, the details for each test result in the Lift Detail table changes to the point on the Lift Chart that corresponds to the selected quantile. You move the threshold by dragging the indicator on the quantile line. Here is the quantile set to 20:
Below the graph, a data table supplements the information presented in the graph. You can minimize the table using the splitter line.
The table has the following columns:
-
Name, the name of the model along with color of the model in the graph
-
Lift Cumulative
-
Gain Cumulative Percentage
-
Percentage Records Cumulative
-
Target Density Cumulative
-
Algorithm
-
Build Rows
-
Test Rows
-
Creation Date (date and time)
Above the Models grid is the Lift Detail Dialog icon . Select a model and click the icon to open the Lift Detail dialog box, which displays lift details for 100 quantiles.
To change the list of models, click and deselect any models for which you do not want to see results. If you deselect a model, then both the lift curve and the detail information for that model are not displayed. By default, results for all models in the node are displayed.
- Lift Detail
The Lift Detail dialog box displays statistics for each quantile from 1 to 100.
Related Topics
Parent topic: Classification Model Test Viewer
Lift Detail
The Lift Detail dialog box displays statistics for each quantile from 1 to 100.
Threshold probability does not always reflect standard probability. For example, the Classification node enables you to specify three different performance settings:
-
Balanced: Apply balance weighting to all target class values.
-
Natural: Do not apply any weighting.
-
Custom: Apply user- created custom weights file.
The default for Classification models is Balanced.
Balanced is implemented by passing weights or costs into the model, depending on the algorithm used.
The threshold probability actually reflects cost rather than standard probability.
To see the difference between Balanced and Natural:
Parent topic: Lift
Profit
The Profit graph displays information related to profit, budget, and threshold for one or more models.
The Profit graph displays at least three lines:
-
A line showing the profit for each model
-
A line indicating the budget
-
A line indicating the threshold
The threshold is a blue vertical line used to select a quantile. As the threshold moves, the details for each test result in the Lift Detail table changes to the point on the Lift Chart that corresponds to the selected quantile. You can move the threshold by dragging the indicator on the quantile line. Here is the quantile set to 20:
To specify the values for profit, click Profit Settings to open the Profit Setting dialog box.
If you are comparing the profit for two or more models, then the lines for different models are different colors. The table below the graph shows the name of the model and the color used to display results for that model.
The bottom pane contains the Models grid and supplements the information presented in the graph. You can minimize the table using the splitter line.
The table has the following columns:
-
Name, the name of the model along with color of the model in the graphs
-
Profit
-
ROI Percentage
-
Records Cumulative Percentage
-
Target Density Cumulative
-
Maximum Profit
-
Maximum Profit Population Percentage
-
Algorithm
-
Build Rows
-
Test Rows
-
Creation Date (and time)
Above the Models grid is the Browse Detail icon. Select a model and click to see the Profit Detail dialog box which displays statistics for each quantile from 1 to 100.
To change the list of models, click and deselect any models for which you do not want to see results. If you deselect a model, then both the profit curve and the detail information for that model are not displayed. By default, results for all models in the node are displayed.
- Profit Detail Dialog
The Profit Detail dialog box displays statistics about profit for quantiles 1 to 100. - Profit Setting Dialog
In the Profit Settings dialog box, you can provide values for profit settings such as budget, increment cost and so on.
Related Topics
Parent topic: Classification Model Test Viewer
Profit Detail Dialog
The Profit Detail dialog box displays statistics about profit for quantiles 1 to 100.
Click OK to dismiss the dialog box.
Parent topic: Profit
Profit Setting Dialog
In the Profit Settings dialog box, you can provide values for profit settings such as budget, increment cost and so on.
- Click Profit Settings to change the following values:
- Startup Cost: The cost of starting the process that creates the profit. The default is
1.
- Incremental Revenue: Incremental revenue earned for each correct prediction. The default is
1.
- Incremental Cost: The cost of each additional item. The default is
1.
- Budget: A total cost that cannot be exceeded. The default value is
1.
- Population: The number of individual cases that the model is applied to. The default is
100.
- Startup Cost: The cost of starting the process that creates the profit. The default is
- Click OK.
Parent topic: Profit
Model Partitions
The Model Partition tab displays the information of the model partitions on a node. The number of partitions can be very large, a fetch size limit will be added.
-
Model name
-
Partition ID
-
Partition Name
-
Predictive Confidence
-
Overall Accuracy
-
Average Accuracy
-
Build Rows
-
Test Rows
-
Cost
-
Algorithm type
-
Creation Date
-
Sort data: To sort data, click
-
Pin partition: The icon to pin or select a partition is enabled when you select a row. Select a row and click
to mark the selected partitioned as pinned in all the Test Result editors. This means that the partition will be loaded when the editor is opened. -
View partition details: Double click the partition name or click to view the details of the partition such as Partition ID, Partition Name, Partition Details Table, and Table Filtering.
-
View model details: Click to view the specific partition model details in the Model Viewer.
-
Select and view models in the Edit Test Result Selection dialog box: Click to select models and view them in the Edit Test Result Selection dialog box
-
Filter model partition: You can filter and sort model partitions based on the model name, partition name, algorithm, and partition keys.
- Select Partition
In the Select Partition dialog box, you can view filtered partitions based on the Partition keys.
Parent topic: Classification Model Test Viewer
Select Partition
In the Select Partition dialog box, you can view filtered partitions based on the Partition keys.
Parent topic: Model Partitions
Viewing Test Results
You can view results of models that are tested in a Classification node and a Test node.
Parent topic: Testing Classification Models
Tuning Classification Models
When you tune a model, you create a derived cost matrix to use for subsequent Test and Apply operations.
Note:
To tune models, you must test the models in the same node that you build them.
If necessary, you can remove tuning and then re-run the node.
To tune a model:
You may have to repeat the tuning steps several times to get the desired results. If necessary, you can remove tuning for a model.
- Remove Tuning
You can remove tuning of a Classification by selecting the Automatic option. - Cost
The Cost tab in Tune Settings enables you to specify costs for target for scoring purposes. - Benefit
In the Benefit tab, you can specify a benefit for each value of the target. Specifying benefits is useful when there are many target values. - ROC
ROC is only supported for binary models. - Lift
Lift measures the degree to which the predictions of a Classification model are better than randomly generated predictions. - Profit
The Profit tab provides a method for maximizing profit.
Related Topics
Parent topic: Testing and Tuning Models
Remove Tuning
You can remove tuning of a Classification by selecting the Automatic option.
To remove tuning for a model:
- Right-click the node and select Go to Properties.
- Go to the Models section and click .
- Select Automatic.
- Run the node.
Parent topic: Tuning Classification Models
Cost
The Cost tab in Tune Settings enables you to specify costs for target for scoring purposes.
By default the cost matrix is initially generated based on all the known target values in the Build Data Source. The cost matrix is set to cost values of 1 to start with.
To specify costs:
To cancel the tuning, click Reset. Tuning returns to Automatic.
To see the impact of the tuning, rerun the model node.
- Costs and Benefits
In a classification problem, you must specify the cost or benefit associated with correct or incorrect classifications.
Related Topics
Parent topic: Tuning Classification Models
Costs and Benefits
In a classification problem, you must specify the cost or benefit associated with correct or incorrect classifications.
By doing so, it is valuable when the cost of different misclassification varies significantly.
You can create a cost matrix to bias the model to minimize the cost or maximize the benefit. The cost/benefit matrix is taken into consideration when the model is scored.
Costs
Suppose the problem is to predict whether a customer is likely to respond to a promotional mail. The target has two categories: YES (the customer responds) and NO (the customer does not respond). Suppose a positive response to the promotion generates $500 and that it costs $5 to do the mailing. After building the model, you compare the model predictions with actual data held aside for testing. At this point, you can evaluate the relative cost of different misclassifications:
-
If the model predicts YES and the actual value is YES, then the cost of misclassification is $0.
-
If the model predicts YES and the actual value is NO, then the cost of misclassification is $5.
-
If the model predicts NO and the actual value is YES, then the cost of misclassification is $495.
-
If the model predicts NO and the actual value is NO, then the cost is $0.
Parent topic: Costs and Benefits
Benefits
Using the same costs, you can approach the relative value of the outcomes from a benefits perspective. When you correctly predict a YES (a responder), the benefit is $495. When you correctly predict a NO (a non-responder), the benefit is $5.00 because you can avoid sending out the mailing. Because the goal is to find the lowest cost solution, benefits are represented as negative numbers.
Parent topic: Costs and Benefits
Benefit
In the Benefit tab, you can specify a benefit for each value of the target. Specifying benefits is useful when there are many target values.
The Benefit tab enables you to:
-
Specify a benefit for each value of the target. The values specified are applied to the cost benefit matrix.
-
Indicate the most important values.
To tune a model using the Benefit tab:
Related Topics
Parent topic: Tuning Classification Models
ROC
ROC is only supported for binary models.
The ROC Tuning tab adds a side panel to the standard ROC Test Viewer. The following information is displayed:
-
Performance Matrix in the upper right pane, displays these matrices:
-
Overall Accuracy: Cost matrix for the maximum Overall Accuracy point on the ROC chart.
-
Average Accuracy: Cost matrix for the maximum Average Accuracy point.
-
Custom Accuracy: Cost matrix for the custom operating point.
You must specify a custom operating point for this option to be available.
-
Model Accuracy: The current Performance Matrix (approximately) of the current model.
You can use the following calculation to derive Model Accuracy from the ROC result provided:
If there is no embedded cost matrix, then find the 50 percent threshold point or the closest one to it. If there is an embedded cost matrix, then find the lowest cost point. For a model to have an embedded cost matrix, it must have either been tuned or it has a cost matrix or cost benefit defined by the default settings of the Build node.
-
-
The Performance Matrix grid shows the performance matrix for the option selected.
-
Click Tune to:
-
Select the current performance option as the one to use to tune the model.
-
Derive a cost matrix from the ROC result at that probability threshold.
Tune Settings, in the lower part of this panel, is updated to display the new matrix.
-
-
Click Clear to clear any tuning specifications and set tuning to Automatic. In other words, no tuning is performed.
- ROC Tuning Steps
Lists the procedure to perform ROC tuning. - Receiver Operating Characteristics
Receiver Operating Characteristics (ROC) is a method for experimenting with changes in the probability threshold and observing the resultant effect on the predictive power of the model.
Related Topics
Parent topic: Tuning Classification Models
ROC Tuning Steps
Lists the procedure to perform ROC tuning.
To perform ROC tuning:
- Select Custom Operating Point
The Specify Custom Threshold dialog box allows you to edit the custom operating point for all the models in the node.
Related Topics
Parent topic: ROC
Select Custom Operating Point
The Specify Custom Threshold dialog box allows you to edit the custom operating point for all the models in the node.
-
To change the Hit Rate or False Alarm, click the appropriate option and adjust the value that you want to use.
-
Alternatively, you can specify the False Positive or False Negative ratio. To do this, click the appropriate option and specify the ratio.
Click OK when you have finished.
Parent topic: ROC Tuning Steps
Receiver Operating Characteristics
Receiver Operating Characteristics (ROC) is a method for experimenting with changes in the probability threshold and observing the resultant effect on the predictive power of the model.
-
The horizontal axis of an ROC graph measures the False Positive Rate as a percentage.
-
The vertical axis shows the True Positive Rate.
-
The top left corner is the optimal location in an ROC curve, indicating a high TP (True Positive) rate versus low FP (False Positive) rate.
-
The area under the ROC curve measures the discriminating ability of a binary Classification model. This measure is especially useful for data sets with an unbalanced target distribution (one target class dominates the other). The larger the area under the curve, the higher the likelihood that an actual positive case is assigned a higher probability of being positive than an actual negative case.
ROC curves are similar to lift charts in that they provide a means of comparison between individual models, and then determine thresholds that yield a high proportion of positive hits. ROC was originally used in signal detection theory to gauge the true hit versus false alarm ratio when sending signals over a noisy channel.
Parent topic: ROC
Lift
Lift measures the degree to which the predictions of a Classification model are better than randomly generated predictions.
To tune a model using Lift:
- About Lift
Lift is the ratio of positive responders in a segment to the positive responders in the population as a whole.
Related Topics
Parent topic: Tuning Classification Models
About Lift
Lift is the ratio of positive responders in a segment to the positive responders in the population as a whole.
For example, if a population has a predicted response rate of 20 percent, but one segment of the population has a predicted response rate of 60 percent, then the lift of that segment is 3 (60 percent/20 percent). Lift measures the following:
-
The concentration of positive predictions within segments of the population and specifies the improvement over the rate of positive predictions in the population as a whole.
-
The performance of targeting models in marketing applications. The purpose of a targeting model is to identify segments of the population with potentially high concentrations of positive responders to a marketing campaign.
The notion of lift implies a binary target: either a Responder or a Non-responder, which means either YES or NO. Lift can be computed for multiclass targets by designating a preferred positive class and combining all other target class values, effectively turning a multiclass target into a binary target. Lift can be applied to both binary and non-binary classifications as well.
The calculation of lift begins by applying the model to test data in which the target values are already known. Then, the predicted results are sorted in order of probability, from highest to lowest Predictive Confidence. The ranked list is divided into quantiles (equal parts). The default number of quantiles is 100.
Parent topic: Lift
Profit
- Profit Setting
In the Profit Setting dialog box, you can change default values for profit related settings. - Profit
Profit provides a method for maximizing profit.
Related Topics
Parent topic: Tuning Classification Models
Profit Setting
In the Profit Setting dialog box, you can change default values for profit related settings.
The default values for Startup Cost, Incremental Revenue, Incremental Cost, and Budget are all 1.
The default value for Population is 100.
Change these values to ones appropriate for your business problem.
Click OK.
Parent topic: Profit
Profit
Profit provides a method for maximizing profit.
You can specify the information listed below. Oracle Data Miner uses these information to create a cost matrix that optimizes profit:
-
Startup cost
-
Incremental revenue
-
Incremental cost
-
Budget
-
Population
Related Topics
Parent topic: Profit
Testing Regression Models
Regression models are tested by comparing the predicted values to known target values in a set of test data.
The historical data for a regression project is typically divided into two data sets:
-
One for building the model
-
One for testing the model
The test data must be compatible with the data used to build the model and must be prepared in the same way that the build data was prepared.
The ways to test Classification and Regression models:
-
By splitting the input data into build data and test data. This is the default. The test data is created by randomly splitting the build data into two subsets. 40 percent of the input data is used for test data.
-
By using all the build data as the test data.
-
By attaching two Data Source nodes to the build node.
-
The first data source, that you connect to the build node, is the source of build data.
-
The second node that you connect is the source of test data.
-
-
By deselecting Perform Test in the Test section of the Properties pane and then using a Test node. By default, all Classification and Regression models are tested.
Test settings specify which metrics to calculate and control the calculation of the metrics.
Oracle Data Mining provides several kinds of information to assess Regression models:
-
Residual Plot
-
Regression Statistics
-
Regression Model Test Viewer
-
Compare Regression test Results
To view test results, first test the model or models in the node:
-
If you tested the models using the default test in the Regression node, then run the node and then right-click the node. Select View Test Results and select the model that you are interest in. The Regression Model Test viewer opens. To compare the test results for all models in the node, select Compare Test Results.
-
If you tested the models using a Test node, then run the Test node and then right-click the node. Select View Test Results and select the model that you are interested in. The Regression Model Test viewer opens. To compare the test results for all models in the node, select Compare Test Results.
You can also compare test results by going to the Models section of the Properties pane of the Build node where you tested the models and click .
- Residual Plot
The residual plot is a scatter plot of the residuals. - Regression Statistics
Oracle Data Mining calculates the statistics Root Mean Squared Error and Mean Absolute Error to help the assessment of the overall quality of Regressions models. - Compare Regression Test Results
You can compare the results of a Regression test for all models that are in a Regression node as well as in a Test node. - Regression Model Test Viewer
You can view the results of a regression model test in the Regression Model Test Viewer.
Parent topic: Testing and Tuning Models
Residual Plot
The residual plot is a scatter plot of the residuals.
Each residual is the difference between the actual value and the value predicted by the model. Residuals can be positive or negative. If residuals are small (close to 0), then the predictions are accurate. A residual plot may indicate that predictions are better for some classes of values than others.
Parent topic: Testing Regression Models
Regression Statistics
Oracle Data Mining calculates the statistics Root Mean Squared Error and Mean Absolute Error to help the assessment of the overall quality of Regressions models.
-
Root Mean Squared Error: The square root of the average squared distance of a data point from the fitted line.
-
Mean Absolute Error: The average of the absolute value of the residuals (error). The Mean Absolute Error is very similar to the Root Mean Square Error but is less sensitive to large errors.
Parent topic: Testing Regression Models
Compare Regression Test Results
You can compare the results of a Regression test for all models that are in a Regression node as well as in a Test node.
To compare test results for all the models in a Regression Build node:
-
If you tested the models when you ran the Regression node, then:
-
Right-click the Regression node that contains the models.
-
Select Compare Test Results.
-
-
If you tested the Regression models in a Test node, then:
-
Right-click the Test node that tests the models.
-
Select Compare Test Results.
-
- Compare Test Results
When you compare test results for two or more Regression models, each model has a color associated with it. This color indicates the results for that model.
Parent topic: Testing Regression Models
Compare Test Results
When you compare test results for two or more Regression models, each model has a color associated with it. This color indicates the results for that model.
For example, if model M1 has purple associated with it, then the bar graphs on the Performance tab for M1 is displayed in purple.
By default, test results for all models in the node are compared. If you do not want to compare all test results, then click . The Edit Test Results Selection dialog box opens. Deselect results that you do not want to see. Click OK when you have finished.
Compare Test Results opens in a new tab. Results are displayed in two tabs:
-
Performance tab: The following metrics are compared on the Performance tab:
-
Predictive Confidence for Classification Models
-
Mean Absolute Error
-
Mean Predicted Value
By default, test results for all models are compared. To edit the list of models, click above pane that lists the models to open the Edit Test Selection (Classification and Regression dialog box.
-
-
Residual tab: Displays the residual plot for each model.
-
You can compare two plots side by side. By default, test results for all models are compared.
-
To edit the list of models, click above pane that lists the models to open the Edit Test Selection (Classification and Regression dialog box.
-
Related Topics
Parent topic: Compare Regression Test Results
Regression Model Test Viewer
You can view the results of a regression model test in the Regression Model Test Viewer.
To view information in the Regression Model Test Viewer:
- Performance (Regression)
The Performance tab displays the test results for several common test metrics. For Regression models, it displays the measures for all models: - Residual
The Residual Plot tab show the residual plot on a per-model basis.
Related Topics
Parent topic: Testing Regression Models
Performance (Regression)
The Performance tab displays the test results for several common test metrics. For Regression models, it displays the measures for all models:
The test metrics are:
-
All Measures (default). The Measure list enables you to select the measures to display. By default, All Measures are displayed. The selected measures are displayed as graphs. If you are comparing test results for two or more models, then the different models have graphs in different colors.
-
Predictive Confidence: Measures how much better the predictions of the mode are than those of the naive model. Predictive Confidence for regression is the same measure as Predictive Confidence for classification.
-
Mean Absolute Error
-
Root Mean Square Error
-
Mean Predicted Value: The average of the predicted values.
-
Mean Actual Value: The average of the actual values.
Two Sort By lists specify sort attribute and sort order. The first Sort By list contains Measure, Creation Date, or Name (the default). The second Sort By list contains the sort order: ascending or descending (default).
The top pane displays these measures as histograms.
The bottom pane contains the Models grid that supplements the information presented in the graphs. You can minimize the table using the splitter line.
The Models grid has the following columns:
-
Name, the name of the model along with color of the model in the graphs.
-
Predictive Confidence
-
Mean Absolute Error
-
Root Mean Square Error
-
Mean Predicted Value
-
Mean Actual Value
-
Algorithm
-
Creation Date (and time)
By default, results for the selected model are displayed. To change the list of models, click and deselect any models for which you do not want to see results. If you deselect a model, then both the histograms and the detail information for that model are not displayed.
Related Topics
Parent topic: Regression Model Test Viewer
Residual
The Residual Plot tab show the residual plot on a per-model basis.
By default, the residual plots are displayed as graph.
-
To see numeric results, click .
-
To change the display back to a graph, click .
-
To see the plot for another model, select the model from the Show list and click Query.
You can control how the plot is displayed in several ways:
-
Select the information displayed on the y-axis and on the x-axis. The default depictions are:
-
X axis: Predicted Value
-
Y axis: Residual
To change this, select information from the lists.
-
-
The default sample size is
2000.
You can make this value larger or smaller. -
You can compare plots side by side. The default is to not compare plots.
If you change any of these fields, then click Query to see the results.
To compare plots side by side, select the model to compare with the current model from the Compare list and click Query. The residual plots are displayed side by side.
The bottom pane show the Residual result summary table. The table contains the Models grid which supplements the information presented in the plots. You can minimize the table using the splitter line.
The table has the following columns:
-
Model, the name of the model along with color of the model in the graphs
-
Predictive Confidence
-
Mean Absolute Error
-
Root Mean Square Error
-
Mean Predicted Value
-
Mean Actual Value
-
Algorithm
-
Creation Date (and time)
By default, results for all models in the node are displayed. To change the list of models, click to open the Edit Test Selection dialog box.
Related Topics
Parent topic: Regression Model Test Viewer