The ore.odmDT
function uses the Oracle Data Mining Decision Tree algorithm, which is based on conditional probabilities. Decision trees generate rules. A rule is a conditional statement that can easily be understood by humans and be used within a database to identify a set of records.
Decision Tree models are classification models.
A decision tree predicts a target value by asking a sequence of questions. At a given stage in the sequence, the question that is asked depends upon the answers to the previous questions. The goal is to ask questions that, taken together, uniquely identify specific target values. Graphically, this process forms a tree structure.
During the training process, the Decision Tree algorithm must repeatedly find the most efficient way to split a set of cases (records) into two child nodes. The ore.odmDT
function offers two homogeneity metrics, gini and entropy, for calculating the splits. The default metric is gini.
For information on the ore.odmDT
function arguments, invoke help(ore.odmDT)
.
Example 4-10 Using the ore.odmDT Function
This example creates an input ore.frame
, builds a model, makes predictions, and generates a confusion matrix.
m <- mtcars m$gear <- as.factor(m$gear) m$cyl <- as.factor(m$cyl) m$vs <- as.factor(m$vs) m$ID <- 1:nrow(m) mtcars_of <- ore.push(m) row.names(mtcars_of) <- mtcars_of # Build the model. dt.mod <- ore.odmDT(gear ~ ., mtcars_of) summary(dt.mod) # Make predictions and generate a confusion matrix. dt.res <- predict (dt.mod, mtcars_of, "gear") with(dt.res, table(gear, PREDICTION))Listing for Example 4-10
R> m <- mtcars R> m$gear <- as.factor(m$gear) R> m$cyl <- as.factor(m$cyl) R> m$vs <- as.factor(m$vs) R> m$ID <- 1:nrow(m) R> mtcars_of <- ore.push(m) R> row.names(mtcars_of) <- mtcars_of R> # Build the model. R> dt.mod <- ore.odmDT(gear ~ ., mtcars_of) R> summary(dt.mod) Call: ore.odmDT(formula = gear ~ ., data = mtcars_of) n = 32 Nodes: parent node.id row.count prediction split 1 NA 0 32 3 <NA> 2 0 1 16 4 (disp <= 196.299999999999995) 3 0 2 16 3 (disp > 196.299999999999995) surrogate full.splits 1 <NA> <NA> 2 (cyl in ("4" "6" )) (disp <= 196.299999999999995) 3 (cyl in ("8" )) (disp > 196.299999999999995) Settings: value prep.auto on impurity.metric impurity.gini term.max.depth 7 term.minpct.node 0.05 term.minpct.split 0.1 term.minrec.node 10 term.minrec.split 20 R> # Make predictions and generate a confusion matrix. R> dt.res <- predict (dt.mod, mtcars_of, "gear") R> with(dt.res, table(gear, PREDICTION)) PREDICTION gear 3 4 3 14 1 4 0 12 5 2 3