28 Random Forest

Learn how to use Random Forest as a classification algorithm.

28.1 About Random Forest

Random Forest is a classification algorithm that builds an ensemble (also called forest) of trees.

The algorithm builds a number of Decision Tree models and predicts using the ensemble. An individual decision tree is built by choosing a random sample from the training data set as the input. At each node of the tree, only a random sample of predictors is chosen for computing the split point. This introduces variation in the data used by the different trees in the forest. The parameters RFOR_SAMPLING_RATIO and RFOR_MTRY are used to specify the sample size and number of predictors chosen at each node. Users can use ODMS_RANDOM_SEED to set the random seed value before running the algorithm.

28.2 Building a Random Forest

The Random Forest is built upon existing infrastructure and Application Programming Interfaces (APIs) of Oracle Machine Learning for SQL.

Random forest models provide attribute importance ranking of predictors. The model is built by specifying parameters in the existing APIs. The scoring is performed using the same SQL queries and APIs as the existing classification algorithms. OML4SQL implements a variant of classical Random Forest algorithm. This implementation supports big data sets. The implementation of the algorithm differs in the following ways:

  • OML4SQL does not support bagging and instead provides sampling without replacement

  • Users have the ability to specify the depth of the tree. Trees are not built to maximum depth.

Note:

The term hyperparameter is also interchangeably used for model setting.

28.3 Scoring with Random Forest

Learn to score with the Random Forest algorithm.

Scoring with Random Forest is the same as any other classification algorithm. The following functions are supported: PREDICTION, PREDICTION_PROBABILITY, PREDICTION_COST, PREDICTION_SET, and PREDICTION_DETAILS.