36.6 Cost-Sensitive Decision Making

Costs are user-specified numbers that bias classification. The algorithm uses positive numbers to penalize more expensive outcomes over less expensive outcomes. Higher numbers indicate higher costs.

The algorithm uses negative numbers to favor more beneficial outcomes over less beneficial outcomes. Lower negative numbers indicate higher benefits.

All classification algorithms can use costs for scoring. You can specify the costs in a cost matrix table, or you can specify the costs inline when scoring. If you specify costs inline and the model also has an associated cost matrix, only the inline costs are used. The PREDICTION, PREDICTION_SET, and PREDICTION_COST functions support costs.

Only the Decision Tree algorithm can use costs to bias the model build. If you want to create a Decision Tree model with costs, create a cost matrix table and provide its name in the CLAS_COST_TABLE_NAME setting for the model. If you specify costs when building the model, the cost matrix used to create the model is used when scoring. If you want to use a different cost matrix table for scoring, first remove the existing cost matrix table then add the new one.

A sample cost matrix table is shown in the following table. The cost matrix specifies costs for a binary target. The matrix indicates that the algorithm must treat a misclassified 0 as twice as costly as a misclassified 1.

Table 36-1 Sample Cost Matrix

ACTUAL_TARGET_VALUE PREDICTED_TARGET_VALUE COST

0

0

0

0

1

2

1

0

1

1

1

0

Example 36-14 Sample Queries With Costs

The table nbmodel_costs contains the cost matrix described in Table 36-1.

SELECT * from nbmodel_costs;

ACTUAL_TARGET_VALUE PREDICTED_TARGET_VALUE       COST
------------------- ---------------------- ----------
                  0                      0          0
                  0                      1          2
                  1                      0          1
                  1                      1          0

The following statement associates the cost matrix with a Naive Bayes model called nbmodel.

BEGIN
  dbms_data_mining.add_cost_matrix('nbmodel', 'nbmodel_costs');
END;
/

The following query takes the cost matrix into account when scoring mining_data_apply_v. The output is restricted to those rows where a prediction of 1 is less costly then a prediction of 0.

SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age
    	FROM mining_data_apply_v
    	WHERE PREDICTION (nbmodel COST MODEL
       USING cust_marital_status, education, household_size) = 1
    	GROUP BY cust_gender
    	ORDER BY cust_gender;
 
C        CNT    AVG_AGE
- ---------- ----------
F         25         38
M        208         43

You can specify costs inline when you invoke the scoring function. If you specify costs inline and the model also has an associated cost matrix, only the inline costs are used. The same query is shown below with different costs specified inline. Instead of the "2" shown in the cost matrix table (Table 36-1), "10" is specified in the inline costs.

SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age
  	FROM mining_data_apply_v
  	WHERE PREDICTION (nbmodel
  			  COST (0,1) values ((0, 10),
  					     (1, 0))
  			  USING cust_marital_status, education, household_size) = 1
  	GROUP BY cust_gender
  	ORDER BY cust_gender;
 
C        CNT    AVG_AGE
- ---------- ----------
F         74         39
M        581         43

The same query based on probability instead of costs is shown below.

SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age
    	FROM mining_data_apply_v
    	WHERE PREDICTION (nbmodel
    	   USING cust_marital_status, education, household_size) = 1
    	GROUP BY cust_gender
    	ORDER BY cust_gender;
 
C        CNT    AVG_AGE
- ---------- ----------
F         73         39
M        577         44