This chapter describes Decision Tree, one of the classification algorithms supported by Oracle Data Mining.
See Also:
Chapter 5, "Classification"This chapter contains the following topics:
The Decision Tree algorithm, like Naive Bayes, is based on conditional probabilities. Unlike Naive Bayes, decision trees generate rules. A rule is a conditional statement that can easily be understood by humans and easily used within a database to identify a set of records.
In some applications of data mining, the accuracy of a prediction is the only thing that really matters. It may not be important to know how the model works. In others, the ability to explain the reason for a decision can be crucial. For example, a Marketing professional would need complete descriptions of customer segments in order to launch a successful marketing campaign. The Decision Tree algorithm is ideal for this type of application.
Oracle Data Mining supports several algorithms that provide rules. In addition to decision trees, clustering algorithms (described in Chapter 7) provide rules that describe the conditions shared by the members of a cluster, and association rules (described in Chapter 8) provide rules that describe associations between attributes.
Rules provide model transparency, a window on the inner workings of the model. Rules show the basis for the model's predictions. Oracle Data Mining supports a high level of model transparency. While some algorithms provide rules, all algorithms provide model details. You can examine model details to determine how the algorithm handles the attributes internally, including transformations and reverse transformations. Transparency is discussed in the context of data preparation in Chapter 19 and in the context of model building in Oracle Data Mining Application Developer's Guide..
Figure 11-1 shows a rule generated by a Decision Tree model. This rule comes from a decision tree that predicts the probability that customers will increase spending if given a loyalty card. A target value of 0 means not likely to increase spending; 1 means likely to increase spending.
The rule shown in Figure 11-1 represents the conditional statement:
IF (current residence > 3.5 and has college degree and is single) THEN predicted target value = 0
This rule is a full rule. A surrogate rule is a related attribute that can be used at apply time if the attribute needed for the split is missing.
Confidence and support are properties of rules. These statistical measures can be used to rank the rules and hence the predictions.
Support: The number of records in the training data set that satisfy the rule.
Confidence: The likelihood of the predicted outcome, given that the rule has been satisfied.
For example, consider a list of 1000 customers (1000 cases). Out of all the customers, 100 satisfy a given rule. Of these 100, 75 are likely to increase spending, and 25 are not likely to increase spending. The support of the rule is 100/1000 (10%). The confidence of the prediction (likely to increase spending) for the cases that satisfy the rule is 75/100 (75%).
The Decision Tree algorithm produces accurate and interpretable models with relatively little user intervention. The algorithm can be used for both binary and multiclass classification problems.
The algorithm is fast, both at build time and apply time. The build process for Decision Tree is parallelized. (Scoring can be parallelized irrespective of the algorithm.)
Decision tree scoring is especially fast. The tree structure, created in the model build, is used for a series of simple tests, (typically 2-7). Each test is based on a single predictor. It is a membership test: either IN or NOT IN a list of values (categorical predictor); or LESS THAN or EQUAL TO some value (numeric predictor).
A decision tree predicts a target value by asking a sequence of questions. At a given stage in the sequence, the question that is asked depends upon the answers to the previous questions. The goal is to ask questions that, taken together, uniquely identify specific target values. Graphically, this process forms a tree structure.
Figure 11-2 is a decision tree with nine nodes (and nine corresponding rules). The target attribute is binary: 1 if the customer will increase spending, 0 if the customer will not increase spending. The first split in the tree is based on the CUST_MARITAL_STATUS
attribute. The root of the tree (node 0) is split into nodes 1 and 3. Married customers are in node 1; single customers are in node 3.
The rule associated with node 1 is:
Node 1 recordCount=712,0 Count=382, 1 Count=330 CUST_MARITAL_STATUS isIN "Married",surrogate:HOUSEHOLD_SIZE isIn "3""4-5"
Node 1 has 712 records (cases). In all 712 cases, the CUST_MARITAL_STATUS
attribute indicates that the customer is married. Of these, 382 have a target of 0 (not likely to increase spending), and 330 have a target of 1 (likely to increase spending).
During the training process, the Decision Tree algorithm must repeatedly find the most efficient way to split a set of cases (records) into two child nodes. Oracle Data Mining offers two homogeneity metrics, gini and entropy, for calculating the splits. The default metric is gini.
Homogeneity metrics asses the quality of alternative split conditions and select the one that results in the most homogeneous child nodes. Homogeneity is also called purity; it refers to the degree to which the resulting child nodes are made up of cases with the same target value. The objective is to maximize the purity in the child nodes. For example, if the target can be either yes or no (will or will not increase spending), the objective is to produce nodes where most of the cases will increase spending or most of the cases will not increase spending.
All classification algorithms, including Decision Tree, support a cost-benefit matrix at apply time. You can use the same cost matrix for building and scoring a Decision Tree model, or you can specify a different cost/benefit matrix for scoring.
In principle, Decision Tree algorithms can grow each branch of the tree just deeply enough to perfectly classify the training examples. While this is sometimes a reasonable strategy, in fact it can lead to difficulties when there is noise in the data, or when the number of training examples is too small to produce a representative sample of the true target function. In either of these cases, this simple algorithm can produce trees that over-fit the training examples. Over-fit is a condition where a model is able to accurately predict the data used to create the model, but does poorly on new data presented to it.
To prevent over-fitting, Oracle Data Mining supports automatic pruning and configurable limit conditions that control tree growth. Limit conditions prevent further splits once the conditions have been satisfied. Pruning removes branches that have insignificant predictive power.
You can generate XML representing a decision tree model; the generated XML satisfies the definition specified in the Data Mining Group Predictive Model Markup Language (PMML) version 2.1 specification. The specification is available at http://www.dmg.org
.
The Decision Tree algorithm is implemented with reasonable defaults for splitting and termination criteria. It is unlikely that you will need to use any of the build settings that are supported for Decision Tree. The settings are described as follows.
Settings to specify the homogeneity metric for finding the optimal split condition:
TREE_IMPURITY_METRIC
can be either gini or entropy. The default is gini.
Settings to control the growth of the tree:
TREE_TERM_MAX_DEPTH
specifies the maximum depth of the tree, from root to leaf inclusive. The default is 7.
TREE_TERM_MINPCT_MODE
specifies the minimum number of cases required in a child node, expressed as a percentage of the rows in the training data. The default is .05%.
TREE_TERM_MINPCT_SPLIT
specifies the minimum number of cases required in a node in order for a further split to be possible. Expressed as a percentage of all the rows in the training data. The default is 1%.
TREE_TERM_MINREC_MODE
specifies the minimum number of cases required in a child node. Default is 10.
TREE_TERM_MINREC_SPLIT
specifies the minimum number of cases required in a node in order for a further split to be possible. Default is 20.
See Also:
Oracle Database PL/SQL Packages and Types Reference for specificsThe Decision Tree algorithm manages its own data preparation internally. It does not require pretreatment of the data. Decision Tree is not affected by Automatic Data Preparation.
Decision Tree interprets missing values as missing at random. The algorithm does not support nested tables and thus does not support sparse data.
See Also:
Chapter 19, "Automatic and Embedded Data Preparation"
Oracle Data Mining Application Developer's Guide for information about nested data and missing value treatment