5 Prediction With R Models

Use the Oracle Machine Learning for R function ore.predict on an OML4R model to predict future behavior.

5.1 About the ore.predict Function

Predictive models allow you to predict future behavior based on past behavior.

After you build a model, you use it to score new data, that is, to make predictions.

R allows you to build many kinds of models. When you score data to predict new results using an R model, the data to score must be in an R data.frame. With the ore.predict function, you can use an R model to score database-resident data in an ore.frame object.

The ore.predict function provides the fastest way to operationalize R-based models for scoring in Oracle Database. The function has no dependencies on PMML or any other plug-ins.

Some advantages of using the ore.predict function to score data in the database are the following:

  • Uses R-generated models to score in-database data.

    The data to score is in an ore.frame object.

  • Maximizes the use of Oracle Database as a compute engine.

    The database provides a commercial grade, high performance, scalable scoring engine.

  • Simplifies application workflow.

    You can go from a model to SQL scoring in one step.

The ore.predict function is a generic function. It has the following usage:

ore.predict(object, newdata, ...)

The value of the object argument is one of the model objects listed in Table 5-1. The value of the newdata argument is an ore.frame object that contains the data to score. The ore.predict function has methods for use with specific R model classes. The ... argument represents the various additional arguments that are accepted by the different methods.

Function ore.predict has methods that support the model objects listed in the table.

Table 5-1 Models Supported by the ore.predict Function

Class of Model Description of Model

glm

Generalized Linear Model

kmeans

k-Means clustering model

lm

Linear regression model

matrix

A matrix with no more than 1000 rows, for use in an hclust hierarchical clustering model

multinom

Multinomial log-linear model

nnet

Neural Network model

ore.model

An OML4R model from the OREModels package

prcomp

Principal components analysis on a matrix

princomp

Principal components analysis on a numeric matrix

rpart

Recursive partitioning and regression tree model

For the function signatures of the ore.predict methods, invoke the help function on the following, as in help("ore.predict-kmeans"):

  • ore.predict-glm

  • ore.predict-kmeans

  • ore.predict-lm

  • ore.predict-matrix

  • ore.predict-multinom

  • ore.predict-nnet

  • ore.predict-ore.model

  • ore.predict-prcomp

  • ore.predict-princomp

  • ore.predict-rpart

5.2 Use the ore.predict Function

These examples demonstrate the use of the ore.predict function.

Example 5-1 Using the ore.predict Function on a Linear Regression Model

This example builds a linear regression model, irisModel, using the lm function on the iris data.frame. It pushes the data set to the database as the temporary table IRIS and the corresponding ore.frame proxy, IRIS. The example scores the model by invoking ore.predict on it and then combines the prediction with IRIS ore.frame object. Finally, it displays the first six rows of the resulting object.

IRISModel <- lm(Sepal.Length ~ ., data = iris)
IRIS <- ore.push(iris)
IRIS_pred <- ore.predict(IRISModel, IRIS, se.fit = TRUE, 
                            interval = "prediction")
IRIS <- cbind(IRIS, IRIS_pred)
head(IRIS)

Listing for This Example

R> IRISModel <- lm(Sepal.Length ~ ., data = iris)
R> IRIS <- ore.push(iris)
R> IRIS_pred <- ore.predict(IRISModel, IRIS, se.fit = TRUE, 
+                              interval = "prediction")
R> IRIS <- cbind(IRIS, IRIS_pred)
R> head(IRIS)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species     PRED    SE.PRED
1          5.1         3.5          1.4         0.2  setosa 5.004788 0.04479188
2          4.9         3.0          1.4         0.2  setosa 4.756844 0.05514933
3          4.7         3.2          1.3         0.2  setosa 4.773097 0.04690495
4          4.6         3.1          1.5         0.2  setosa 4.889357 0.05135928
5          5.0         3.6          1.4         0.2  setosa 5.054377 0.04736842
6          5.4         3.9          1.7         0.4  setosa 5.388886 0.05592364
  LOWER.PRED UPPER.PRED
1   4.391895   5.617681
2   4.140660   5.373027
3   4.159587   5.386607
4   4.274454   5.504259
5   4.440727   5.668026
6   4.772430   6.005342

R> head(IRIS)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species     PRED    SE.PRED LOWER.PRED UPPER.PRED
1          5.1         3.5          1.4         0.2  setosa 5.004788 0.04479188   4.391895   5.617681
2          4.9         3.0          1.4         0.2  setosa 4.756844 0.05514933   4.140660   5.373027
3          4.7         3.2          1.3         0.2  setosa 4.773097 0.04690495   4.159587   5.386607
4          4.6         3.1          1.5         0.2  setosa 4.889357 0.05135928   4.274454   5.504259
5          5.0         3.6          1.4         0.2  setosa 5.054377 0.04736842   4.440727   5.668026
6          5.4         3.9          1.7         0.4  setosa 5.388886 0.05592364   4.772430   6.005342

Example 5-2 Using the ore.predict Function on a Generalized Linear Regression Model

This example builds a generalized linear model using the infert data set and then invokes the ore.predict function on the model.

infertModel <-
  glm(case ~ age + parity + education + spontaneous + induced,
  data = infert, family = binomial())
INFERT <- ore.push(infert)
INFERTpred <- ore.predict(infertModel, INFERT, type = "response",
                          se.fit = TRUE)
INFERT <- cbind(INFERT, INFERTpred)
head(INFERT)

Listing for This Example

R> infertModel <-
+   glm(case ~ age + parity + education + spontaneous + induced,
+   data = infert, family = binomial())
R> INFERT <- ore.push(infert)
R> INFERTpred <- ore.predict(infertModel, INFERT, type = "response",
+                           se.fit = TRUE)
R> INFERT <- cbind(INFERT, INFERTpred)
R> head(INFERT)
  education age parity induced case spontaneous stratum pooled.stratum
1    0-5yrs  26      6       1    1           2       1              3
2    0-5yrs  42      1       1    1           0       2              1
3    0-5yrs  39      6       2    1           0       3              4
4    0-5yrs  34      4       2    1           0       4              2
5   6-11yrs  35      3       1    1           1       5             32
6   6-11yrs  36      4       2    1           1       6             36
       PRED    SE.PRED
1 0.5721916 0.20630954
2 0.7258539 0.17196245
3 0.1194459 0.08617462
4 0.3684102 0.17295285
5 0.5104285 0.06944005
6 0.6322269 0.10117919

Example 5-3 Using the ore.predict Function on an ore.model Model

This example pushes the iris data set to the database as the temporary table IRIS and the corresponding ore.frame proxy, IRIS. The example builds a linear regression model, IRISModel2, using the ore.lm function. It scores the model and adds a column to IRIS.

IRIS <- ore.push(iris)
IRISModel2 <- ore.lm(Sepal.Length ~ ., data = IRIS)
IRIS$PRED <- ore.predict(IRISModel2, IRIS)
head(IRIS, 3)

Listing for This Example

R> IRIS <- ore.push(iris)
R> IRISModel2 <- ore.lm(Sepal.Length ~ ., data = IRIS)
R> IRIS$PRED <- ore.predict(IRISModel, IRIS)
R> head(IRIS, 3)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species     PRED
1          5.1         3.5          1.4         0.2  setosa 5.004788
2          4.9         3.0          1.4         0.2  setosa 4.756844
3          4.7         3.2          1.3         0.2  setosa 4.773097