5 Predicting With R Models

Predictive models allow you to predict future behavior based on past behavior. After you build a model, you use it to score new data, that is, to make predictions.

R allows you to build many kinds of models. When you score data to predict new results using an R model, the data to score must be in an R data.frame. With the ore.predict function, you can use an R model to score database-resident data in an ore.frame object.

With the ore.predict function, you can only make predictions using ore.frame objects; you cannot rebuild the model. For scalability and performance, build models in the database table using the algorithms and functions described in Chapter 4, "Building Models in Oracle R Enterprise." These include both algorithms that are native to Oracle R Enterprise and those from Oracle Data Mining that are exposed in R.

The ore.predict function is a generic function. It has the following usage:

ore.predict(object, newdata, ...)

The value of the object argument is one of the R models or objects listed in Table 5-1. The value of the newdata argument is an ore.frame object that contains the data to score. The OREpredict package has methods for use with specific R model classes. The ... argument represents the various additional arguments that are accepted by the different methods.

Table 5-1 lists the methods employed by the generic ore.predict function, the class of the object the method accepts as the object argument, and a description of the type of model or object.

Table 5-1 Methods of the Generic ore.predict Function

OREpredict Method Class of Object Description of Object

ore.predict-glm

glm

Generalized linear model

ore.predict-kmeans

kmeans

k-Means clustering model

ore.predict-lm

lm

Linear regression model

ore.predict-matrix

matrix

A matrix with no more than 1000 rows

ore.predict-multinom

multinom

Multinomial log-linear model

ore.predict-nnet

nnet

Neural network models

ore.predict-ore.model

ore.model

An Oracle R Enterprise model

ore.predict-prcomp

prcomp

Principal components analysis on a matrix

ore.predict-princomp

princomp

Principal components analysis on a numeric matrix

ore.predict-rpart

rpart

Recursive partitioning and regression tree model


For the arguments of the ore.predict methods, invoke the help function on the method, such as help("ore.predict-glm").

Example 5-1 builds a linear regression model, irisModel, using the lm function on the iris data.frame. It pushes the data set to the database as iris_of, an ore.frame object. It then scores the model by invoking ore.predict on it.

Example 5-1 Using the ore.predict Function on an LM Model

irisModel <- lm(Sepal.Length ~ ., data = iris)
iris_of <- ore.push(iris)
iris_of_pred <- ore.predict(irisModel, iris_of, se.fit = TRUE, 
                            interval = "prediction")
iris_of <- cbind(iris_of, iris_of_pred)
head(iris_of)

Listing for Example 5-1

R> irisModel <- lm(Sepal.Length ~ ., data = iris)
R> iris_of <- ore.push(iris)
R> iris_of_pred <- ore.predict(irisModel, iris_of, se.fit = TRUE, 
+                              interval = "prediction")
R> iris_of <- cbind(iris_of, iris_of_pred)
R> head(iris_of)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species     PRED    SE.PRED
1          5.1         3.5          1.4         0.2  setosa 5.004788 0.04479188
2          4.9         3.0          1.4         0.2  setosa 4.756844 0.05514933
3          4.7         3.2          1.3         0.2  setosa 4.773097 0.04690495
4          4.6         3.1          1.5         0.2  setosa 4.889357 0.05135928
5          5.0         3.6          1.4         0.2  setosa 5.054377 0.04736842
6          5.4         3.9          1.7         0.4  setosa 5.388886 0.05592364
  LOWER.PRED UPPER.PRED
1   4.391895   5.617681
2   4.140660   5.373027
3   4.159587   5.386607
4   4.274454   5.504259
5   4.440727   5.668026
6   4.772430   6.005342

R> head(iris_of)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species     PRED    SE.PRED LOWER.PRED UPPER.PRED
1          5.1         3.5          1.4         0.2  setosa 5.004788 0.04479188   4.391895   5.617681
2          4.9         3.0          1.4         0.2  setosa 4.756844 0.05514933   4.140660   5.373027
3          4.7         3.2          1.3         0.2  setosa 4.773097 0.04690495   4.159587   5.386607
4          4.6         3.1          1.5         0.2  setosa 4.889357 0.05135928   4.274454   5.504259
5          5.0         3.6          1.4         0.2  setosa 5.054377 0.04736842   4.440727   5.668026
6          5.4         3.9          1.7         0.4  setosa 5.388886 0.05592364   4.772430   6.005342

Example 5-2 builds a generalized linear model using the infert data set and then invokes the ore.predict function on the model.

Example 5-2 Using the ore.predict Function on a GLM Model

infertModel <-
  glm(case ~ age + parity + education + spontaneous + induced,
  data = infert, family = binomial())
INFERT <- ore.push(infert)
INFERTpred <- ore.predict(infertModel, INFERT, type = "response",
                          se.fit = TRUE)
INFERT <- cbind(INFERT, INFERTpred)
head(INFERT)

Listing for Example 5-2

R> infertModel <-
+   glm(case ~ age + parity + education + spontaneous + induced,
+   data = infert, family = binomial())
R> INFERT <- ore.push(infert)
R> INFERTpred <- ore.predict(infertModel, INFERT, type = "response",
+                           se.fit = TRUE)
R> INFERT <- cbind(INFERT, INFERTpred)
R> head(INFERT)
  education age parity induced case spontaneous stratum pooled.stratum
1    0-5yrs  26      6       1    1           2       1              3
2    0-5yrs  42      1       1    1           0       2              1
3    0-5yrs  39      6       2    1           0       3              4
4    0-5yrs  34      4       2    1           0       4              2
5   6-11yrs  35      3       1    1           1       5             32
6   6-11yrs  36      4       2    1           1       6             36
       PRED    SE.PRED
1 0.5721916 0.20630954
2 0.7258539 0.17196245
3 0.1194459 0.08617462
4 0.3684102 0.17295285
5 0.5104285 0.06944005
6 0.6322269 0.10117919