5 Predicting With R Models

This chapter describes the Oracle R Enterprise function ore.predict and provides some examples of its use. The chapter contains the following topics:

About the ore.predict Function

Predictive models allow you to predict future behavior based on past behavior. After you build a model, you use it to score new data, that is, to make predictions.

R allows you to build many kinds of models. When you score data to predict new results using an R model, the data to score must be in an R data.frame. With the ore.predict function, you can use an R model to score database-resident data in an ore.frame object.

The ore.predict function provides the fastest way to operationalize R-based models for scoring in Oracle Database. The function has no dependencies on PMML or any other plug-ins.

Some advantages of using the ore.predict function to score data in the database are the following:

  • Uses R-generated models to score in-database data.

    The data to score is in an ore.frame object.

  • Maximizes the use of Oracle Database as a compute engine.

    The database provides a commercial grade, high performance, scalable scoring engine.

  • Simplifies application workflow.

    You can go from a model to SQL scoring in one step.

The ore.predict function is a generic function. It has the following usage:

ore.predict(object, newdata, ...)

The value of the object argument is one of the model objects listed in Table 5-1. The value of the newdata argument is an ore.frame object that contains the data to score. The ore.predict function has methods for use with specific R model classes. The ... argument represents the various additional arguments that are accepted by the different methods.

Function ore.predict has methods that support the model objects listed in Table 5-1.

Table 5-1 Models Supported by the ore.predict Function

Class of Model Description of Model

glm

Generalized linear model

kmeans

k-Means clustering model

lm

Linear regression model

matrix

A matrix with no more than 1000 rows, for use in an hclust hierarchical clustering model

multinom

Multinomial log-linear model

nnet

Neural network model

ore.model

An Oracle R Enterprise model from the OREModels package

prcomp

Principal components analysis on a matrix

princomp

Principal components analysis on a numeric matrix

rpart

Recursive partitioning and regression tree model


For the function signatures of the ore.predict methods, invoke the help function on the following, as in help("ore.predict-kmeans"):

  • ore.predict-glm

  • ore.predict-kmeans

  • ore.predict-lm

  • ore.predict-matrix

  • ore.predict-multinom

  • ore.predict-nnet

  • ore.predict-ore.model

  • ore.predict-prcomp

  • ore.predict-princomp

  • ore.predict-rpart

Using the ore.predict Function

The following examples demonstrate the use of the ore.predict function.

Example 5-1 builds a linear regression model, irisModel, using the lm function on the iris data.frame. The example pushes the data set to the database as the temporary table IRIS and the corresponding ore.frame proxy, IRIS. The example scores the model by invoking ore.predict on it and then combines the prediction with IRIS ore.frame object. Finally, it displays the first six rows of the resulting object.

Example 5-1 Using the ore.predict Function on a Linear Regression Model

IRISModel <- lm(Sepal.Length ~ ., data = iris)
IRIS <- ore.push(iris)
IRIS_pred <- ore.predict(IRISModel, IRIS, se.fit = TRUE, 
                            interval = "prediction")
IRIS <- cbind(IRIS, IRIS_pred)
head(IRIS)

Listing for Example 5-1

R> IRISModel <- lm(Sepal.Length ~ ., data = iris)
R> IRIS <- ore.push(iris)
R> IRIS_pred <- ore.predict(IRISModel, IRIS, se.fit = TRUE, 
+                              interval = "prediction")
R> IRIS <- cbind(IRIS, IRIS_pred)
R> head(IRIS)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species     PRED    SE.PRED
1          5.1         3.5          1.4         0.2  setosa 5.004788 0.04479188
2          4.9         3.0          1.4         0.2  setosa 4.756844 0.05514933
3          4.7         3.2          1.3         0.2  setosa 4.773097 0.04690495
4          4.6         3.1          1.5         0.2  setosa 4.889357 0.05135928
5          5.0         3.6          1.4         0.2  setosa 5.054377 0.04736842
6          5.4         3.9          1.7         0.4  setosa 5.388886 0.05592364
  LOWER.PRED UPPER.PRED
1   4.391895   5.617681
2   4.140660   5.373027
3   4.159587   5.386607
4   4.274454   5.504259
5   4.440727   5.668026
6   4.772430   6.005342

R> head(IRIS)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species     PRED    SE.PRED LOWER.PRED UPPER.PRED
1          5.1         3.5          1.4         0.2  setosa 5.004788 0.04479188   4.391895   5.617681
2          4.9         3.0          1.4         0.2  setosa 4.756844 0.05514933   4.140660   5.373027
3          4.7         3.2          1.3         0.2  setosa 4.773097 0.04690495   4.159587   5.386607
4          4.6         3.1          1.5         0.2  setosa 4.889357 0.05135928   4.274454   5.504259
5          5.0         3.6          1.4         0.2  setosa 5.054377 0.04736842   4.440727   5.668026
6          5.4         3.9          1.7         0.4  setosa 5.388886 0.05592364   4.772430   6.005342

Example 5-2 builds a generalized linear model using the infert data set and then invokes the ore.predict function on the model.

Example 5-2 Using the ore.predict Function on a Generalized Linear Regression Model

infertModel <-
  glm(case ~ age + parity + education + spontaneous + induced,
  data = infert, family = binomial())
INFERT <- ore.push(infert)
INFERTpred <- ore.predict(infertModel, INFERT, type = "response",
                          se.fit = TRUE)
INFERT <- cbind(INFERT, INFERTpred)
head(INFERT)

Listing for Example 5-2

R> infertModel <-
+   glm(case ~ age + parity + education + spontaneous + induced,
+   data = infert, family = binomial())
R> INFERT <- ore.push(infert)
R> INFERTpred <- ore.predict(infertModel, INFERT, type = "response",
+                           se.fit = TRUE)
R> INFERT <- cbind(INFERT, INFERTpred)
R> head(INFERT)
  education age parity induced case spontaneous stratum pooled.stratum
1    0-5yrs  26      6       1    1           2       1              3
2    0-5yrs  42      1       1    1           0       2              1
3    0-5yrs  39      6       2    1           0       3              4
4    0-5yrs  34      4       2    1           0       4              2
5   6-11yrs  35      3       1    1           1       5             32
6   6-11yrs  36      4       2    1           1       6             36
       PRED    SE.PRED
1 0.5721916 0.20630954
2 0.7258539 0.17196245
3 0.1194459 0.08617462
4 0.3684102 0.17295285
5 0.5104285 0.06944005
6 0.6322269 0.10117919

Example 5-3 pushes the iris data set to the database as the temporary table IRIS and the corresponding ore.frame proxy, IRIS. The example builds a linear regression model, IRISModel2, using the ore.lm function. It scores the model and adds a column to IRIS.

Example 5-3 Using the ore.predict Function on an ore.model Model

IRIS <- ore.push(iris)
IRISModel2 <- ore.lm(Sepal.Length ~ ., data = IRIS)
IRIS$PRED <- ore.predict(IRISModel2, IRIS)
head(IRIS, 3)

Listing for Example 5-3

R> IRIS <- ore.push(iris)
R> IRISModel2 <- ore.lm(Sepal.Length ~ ., data = IRIS)
R> IRIS$PRED <- ore.predict(IRISModel, IRIS)
R> head(IRIS, 3)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species     PRED
1          5.1         3.5          1.4         0.2  setosa 5.004788
2          4.9         3.0          1.4         0.2  setosa 4.756844
3          4.7         3.2          1.3         0.2  setosa 4.773097