16 Decision Tree
Oracle Machine Learning supports Decision Tree as one of the classification algorithms.
16.1 About Decision Tree
Decision Tree classifies data using a tree structure of rules, making predictions clear and easy to interpret.
Decision tree is a supervised machine learning algorithm used for classifying data. Decision tree has a tree structure built top-down that has a root node, branches, and leaf nodes. In some applications of machine learning, the reason for predicting one outcome or another may not be important in evaluating the overall quality of a model. In others, the ability to explain the reason for a decision can be crucial. You can use decision tree rules to validate models in such problems. The Decision Tree algorithm, like Naive Bayes, is based on conditional probabilities. Unlike Naive Bayes, decision trees generate rules. A rule is a conditional statement that can be understood by humans and used within a database to identify a set of records.
For example, a Marketing professional requires complete descriptions of customer segments to launch a successful marketing campaign. The Decision Tree algorithm is ideal for this type of application.
Use decision tree rules to validate models. If the rules make sense to a subject matter expert, then this validates the model.
16.1.1 Decision Tree Rules
Decision Tree generates rules that provide transparency, helping validate models by showing the basis for predictions.
Oracle Machine Learning supports several algorithms that provide rules. In addition to decision trees, clustering algorithms provide rules that describe the conditions shared by the members of a cluster, and association rules provide rules that describe associations between attributes.
Rules provide model transparency, a window on the inner workings of the model. Rules show the basis for the model's predictions. Oracle Machine Learning supports a high level of model transparency. While some algorithms provide rules, all algorithms provide model details. You can examine model details to determine how the algorithm handles the attributes internally, including transformations and reverse transformations. Transparency is discussed in the context of data preparation and in the context of model building in Oracle Machine Learning for SQL User’s Guide.
The following figure shows a rule generated by a Decision Tree model. This rule comes from a decision tree that predicts the probability that customers increase spending if given a loyalty card. A target value of 0 means not likely to increase spending; 1 means likely to increase spending.
The rule shown in the figure represents the conditional statement:
IF (current residence > 3.5 and has college degree and is single) THEN predicted target value = 0
This rule is a full rule. A surrogate rule is a related attribute that can be used at apply time if the attribute needed for the split is missing.
16.1.1.1 Confidence and Support
Confidence and support are properties of rules. These statistical measures can be used to rank the rules and hence the predictions.
Support: The number of records in the training data set that satisfy the rule.
Confidence: The likelihood of the predicted outcome, given that the rule has been satisfied.
For example, consider a list of 1000 customers (1000 cases). Out of all the customers, 100 satisfy a given rule. Of these 100, 75 are likely to increase spending, and 25 are not likely to increase spending. The support of the rule is 100/1000 (10%). The confidence of the prediction (likely to increase spending) for the cases that satisfy the rule is 75/100 (75%).
16.1.2 Advantages of Decision Trees
Decision Tree is fast, accurate, and interpretable, suitable for binary and multiclass classification with minimal intervention.
The Decision Tree algorithm produces accurate and interpretable models with relatively little user intervention. The algorithm can be used for both binary and multiclass classification problems.
The algorithm is fast, both at build time and apply time. The build process for Decision Tree supports parallel execution. (Scoring supports parallel execution irrespective of the algorithm.)
Decision Tree scoring is especially fast. The tree structure, created in the model build, is used for a series of simple tests, (typically 2-7). Each test is based on a single predictor. It is a membership test: either IN or NOT IN a list of values (categorical predictor); or LESS THAN or EQUAL TO some value (numeric predictor).
Related Topics
16.1.3 XML for Decision Tree Models
Learn about generating XML representation of Decision Tree models.
You can generate XML representing a Decision Tree model; the generated XML satisfies the definition specified in the Predictive Model Markup Language (PMML) version 2.1 specification.
Related Topics
16.2 Growing a Decision Tree
Predict a target value by a sequence of questions to form or grow a decision tree. A sample here shows how to grow a decision tree.
A decision tree predicts a target value by asking a sequence of questions. At a given stage in the sequence, the question that is asked depends upon the answers to the previous questions. The goal is to ask questions that, taken together, uniquely identify specific target values. Graphically, this process forms a tree structure.
The figure is a decision tree with nine nodes (and nine corresponding rules). The target attribute is binary: 1 if the customer increases spending, 0 if the customer does not increase spending. The first split in the tree is based on the CUST_MARITAL_STATUS
attribute. The root of the tree (node 0) is split into nodes 1 and 3. Married customers are in node 1; single customers are in node 3.
The rule associated with node 1 is:
Node 1 recordCount=712,0 Count=382, 1 Count=330 CUST_MARITAL_STATUS isIN "Married",surrogate:HOUSEHOLD_SIZE isIn "3""4-5"
Node 1 has 712 records (cases). In all 712 cases, the CUST_MARITAL_STATUS
attribute indicates that the customer is married. Of these, 382 have a target of 0 (not likely to increase spending), and 330 have a target of 1 (likely to increase spending).
16.2.1 Splitting
Decision Tree uses homogeneity metrics like gini and entropy to create the most homogeneous child nodes.
During the training process, the Decision Tree algorithm must repeatedly find the most efficient way to split a set of cases (records) into two child nodes. Oracle Machine Learning offers two homogeneity metrics, gini and entropy, for calculating the splits. The default metric is gini.
Homogeneity metrics asses the quality of alternative split conditions and select the one that results in the most homogeneous child nodes. Homogeneity is also called purity; it refers to the degree to which the resulting child nodes are made up of cases with the same target value. The objective is to maximize the purity in the child nodes. For example, if the target can be either yes or no (does or does not increase spending), the objective is to produce nodes where most of the cases either increase spending or most of the cases do not increase spending.
16.2.2 Cost Matrix
Use a cost matrix to optimize Decision Tree scoring,
All classification algorithms, including Decision Tree, support a cost-benefit matrix at apply time. You can use the same cost matrix for building and scoring a decision tree model, or you can specify a different cost or benefit matrix for scoring.
Related Topics
16.2.3 Preventing Over-Fitting
Prevent over-fitting with automatic pruning and configurable limits.
In principle, the Decision Tree algorithm can grow each branch of the tree deeply enough to perfectly classify the training examples. While this is sometimes a reasonable strategy, in fact it can lead to difficulties when there is noise in the data, or when the number of training examples is too small to produce a representative sample of the true target function. In either of these cases, this simple algorithm can produce trees that over-fit the training examples. Over-fit is a condition where a model is able to accurately predict the data used to create the model, but does poorly on new data presented to it.
To prevent over-fitting, Oracle Machine Learning supports automatic pruning and configurable limit conditions that control tree growth. Limit conditions prevent further splits once the conditions have been satisfied. Pruning removes branches that have insignificant predictive power.
16.3 Tuning the Decision Tree Algorithm
Fine tune the Decision Tree algorithm with various parameters.
The Decision Tree algorithm is implemented with reasonable defaults for splitting and termination criteria. However several build settings are available for fine tuning.
You can specify a homogeneity metric for finding the optimal split condition for a tree. The default metric is gini. The entropy metric is also available.
Note:
The term hyperparameter is also interchangeably used for model setting.The training data attributes are binned as part of the algorithm's data preparation. You can alter the number of bins used by the binning step. There is a trade-off between the number of bins used and the time required for the build.
See Also:
DBMS_DATA_MINING —Algorithm Settings: Decision Tree for a listing and description of the available model settings.Note:
The term hyperparameter is also interchangeably used for model setting.