The ore.rowApply
function invokes an R script with an ore.frame
as the input data. The ore.rowApply
function passes the ore.frame
to the user-defined input function as the first argument to that function. The rows
argument to the ore.rowApply
function specifies the number of rows to pass to each invocation of the user-defined R function. The last chunk or rows may have fewer rows than the number specified. The ore.rowApply
function can use data-parallel execution, in which one or more R engines perform the same R function, or task, on different partitions of data.
The syntax of the ore.rowApply
function is the following:
ore.rowApply(X, FUN, ..., FUN.VALUE = NULL, FUN.NAME = NULL, rows = 1, FUN.OWNER = NULL, parallel = getOption("ore.parallel", NULL))
The ore.rowApply
function returns an ore.list
object or an ore.frame
object.
See Also:
"Arguments for Functions that Run Scripts" for descriptions of the arguments to function ore.rowApply
Example 6-15 uses the e1071
package, previously downloaded from CRAN. The example also uses the nbmod
object, which is the Naive Bayes model created in Example 6-12.
Example 6-15 does the following:
Loads the package e1071
.
Pushes the iris
data set to the database as the IRIS
temporary table and ore.frame
object.
Creates a copy of IRIS
as IRIS_PRED
and adds the PRED column to IRIS_PRED
to contain the predictions.
Invokes the ore.rowApply
function, passing the IRIS
ore.frame
as the data source for user-defined R function and the user-defined R function itself.
The user-defined function does the following:
Loads the package e1071
so that it is available to the R engine or engines that run in the database.
Converts the Species column to a factor because, although the ore.frame
defined factors, when they are loaded to the user-defined function, factors appear as character vectors.
Invokes the predict
method and returns the res
object, which contains the predictions in the column added to the data set.
The example pulls the model to the client R session.
Passes IRIS_PRED
as the argument FUN.VALUE
, which specifies the structure of the object that the ore.rowApply
function returns.
Specifies the number of rows to pass to each invocation of the user-defined function.
Displays the class of res
, and invokes the table
function to display the Species column and the PRED column of the res
object.
Example 6-15 Using the ore.rowApply Function
library(e1071) IRIS <- ore.push(iris) IRIS_PRED <- IRIS IRIS_PRED$PRED <- "A" res <- ore.rowApply( IRIS, function(dat, nbmod) { library(e1071) dat$Species <- as.factor(dat$Species) dat$PRED <- predict(nbmod, newdata = dat) dat }, nbmod = ore.pull(nbmod), FUN.VALUE = IRIS_PRED, rows = 10) class(res) table(res$Species, res$PRED)Listing for Example 6-15
R> library(e1071) R> IRIS <- ore.push(iris) R> IRIS_PRED <- IRIS R> IRIS_PRED$PRED <- "A" R> res <- ore.rowApply( + IRIS , + function(dat, nbmod) { + library(e1071) + dat$Species <- as.factor(dat$Species) + dat$PRED <- predict(nbmod, newdata = dat) + dat + }, + nbmod = ore.pull(nbmod), + FUN.VALUE = IRIS_PRED, + rows = 10) R> class(res) [1] "ore.frame" attr(,"package") [1] "OREbase" R> table(res$Species, res$PRED) setosa versicolor virginica setosa 50 0 0 versicolor 0 47 3 virginica 0 3 47
As Example 6-13 does, Example 6-16 uses the C50
package to score churn data (that is, to predict which customers are likely to churn) using C5.0 models. However, instead of partitioning the data by a column, Example 6-16 partitions the data by a number of rows. The example scores the customers from the specified state in parallel. The example uses datastores and saves functions to the Oracle R Enterprise R script repository, which allows the functions to be used by the Oracle R Enterprise SQL API functions.
Example 6-16 first loads C50
package and the data sets. The example deletes the datastores with names containing myC5.0modelFL
, if they exist. It invokes ore.drop
to delete the CHURN_TEST table, if it exists, and then invokes ore.create
to create the CHURN_TEST table from the churnTest
data set.
The example next invokes ore.getLevels
, which returns a list
of the levels for each factor column. The invocation excludes the first column, which is state, because the levels for that column are not needed. Getting the levels first can ensure that all possible levels are provided during model building, even if some rows do not have values for some of the levels. The ore.delete
invocation ensures that no datastore with the specified name exists and the ore.save
invocation saves the xlevels
object in the datastore named myXLevels
.
Example 6-16 creates a user-defined function, myC5.0FunctionForLevels
, that generates a C5.0 model. The function uses the list of levels returned by function ore.getXlevels
instead of computing the levels using the as.factor
function as the user-defined function does in Example 6-13. It uses the levels to convert the column type from character vector to factor. The function myC5.0FunctionForLevels
returns the value TRUE.
The example saves the function in the Oracle R Enterprise R script repository.
The example next gets a list of datastores that have names that include the specified string and deletes those datastores if they exist.
The example then invokes ore.groupApply
, which invokes function myC5.0FunctionForLevels
on each state in the CHURN_TEST
data. To each myC5.0FunctionForLevels
invocation, ore.groupApply
passes the datastore that contains the xlevels
object and a prefix to use in naming the datastore generated by myC5.0FunctionForLevels
. It also passes the ore.connect
control argument to connect to the database in the embedded R function, which enables the use of objects stored in a datastore. The ore.groupApply
invocation returns a list that contains the results of all of the invocations of myC5.0FunctionForLevels
.
The example pulls the result over to the local R session and verifies that myC5.0FunctionForLevels
returned TRUE
for each state in the data source.
Example 6-16 next creates another user-defined another function, myScoringFunction
, and stores it in the Oracle R Enterprise R script repository. The function scores a C5.0 model for the levels of a state and returns the results in a data.frame
.
The example then invokes function ore.rowApply
. It filters the input data to use only data for the state of Massachusetts. It specifies myScoringFunction
as the function to invoke and passes that user-defined function the name of the datastore that contains the xlevels
object and a prefix to use in loading the datastore that contains the C5.0 model for the state. The ore.rowApply
invocation specifies invoking myScoringFunction
on 200 rows of the data set in each parallel R engine. It uses the FUN.VALUE
argument so that ore.rowApply
returns an ore.frame
that contains the results of all of the myScoringFunction
invocations. The variable scores
gets the results of the ore.rowApply
invocation.
Finally, Example 6-16 prints the scores
object and then uses the table function to display the confusion matrix for the scoring.
See Also:
Example A-8 for an invocation of the rqRowEval
function that produces the same result as the ore.rowApply
function in Example 6-16
Example 6-16 Using the ore.rowApply Function with Datastores and Scripts
library(C50) data(churn) ore.drop("CHURN_TEST" ore.create(churnTest, "CHURN_TEST") xlevels <- ore.getXlevels(~ ., CHURN_TEST[,-1]) ore.delete("myXLevels") ore.save(xlevels, name = "myXLevels") ore.scriptDrop("myC5.0FunctionForLevels") ore.scriptCreate("myC5.0FunctionForLevels", function(dat, xlevelsDatastore, datastorePrefix) { library(C50) state <- dat[1,"state"] datastoreName <- paste(datastorePrefix, dat[1, "state"], sep = "_") dat$state <- NULL ore.load(name = xlevelsDatastore) for (j in names(xlevels)) dat[[j]] <- factor(dat[[j]], levels = xlevels[[j]]) c5mod <- C5.0(churn ~ ., data = dat, rules = TRUE) ore.save(c5mod, name = datastoreName) TRUE }) ds.v <- ore.datastore(pattern= "myC5.0modelFL")$datastore.name for (ds in ds.v) ore.delete(name = ds) res <- ore.groupApply(CHURN_TEST, INDEX=CHURN_TEST$state, FUN.NAME = "myC5.0FunctionForLevels", xlevelsDatastore = "myXLevels", datastorePrefix = "myC5.0modelFL", ore.connect = TRUE) res <- ore.pull(res) all(as.logical(res) == TRUE) ore.scriptDrop("myScoringFunction") ore.scriptCreate("myScoringFunction", function(dat, xlevelsDatastore, datastorePrefix) { library(C50) state <- dat[1,"state"] datastoreName <- paste(datastorePrefix,state,sep="_") dat$state <- NULL ore.load(name = xlevelsDatastore) for (j in names(xlevels)) dat[[j]] <- factor(dat[[j]], levels = xlevels[[j]]) ore.load(name = datastoreName) res <- data.frame(pred = predict(c5mod, dat, type = "class"), actual = dat$churn, state = state) res } ) scores <- ore.rowApply( CHURN_TEST[CHURN_TEST$state == "MA",], FUN.NAME = "myScoringFunction", xlevelsDatastore = "myXLevels", datastorePrefix = "myC5.0modelFL", ore.connect = TRUE, parallel = TRUE, FUN.VALUE = data.frame(pred = character(0), actual = character(0), state = character(0)), rows=200) scores table(scores$actual, scores$pred)Listing for Example 6-16
R> library(C50) R> data(churn) R> R> ore.drop("CHURN_TEST" R> ore.create(churnTest, "CHURN_TEST") R> R> xlevels <- ore.getXlevels(~ ., CHURN_TEST[,-1]) R> ore.delete("myXLevels") [1] "myXLevels R> ore.save(xlevels, name = "myXLevels") R> R> ore.scriptDrop("myC5.0FunctionForLevels") R> ore.scriptCreate("myC5.0FunctionForLevels", + function(dat, xlevelsDatastore, datastorePrefix) { + library(C50) + state <- dat[1,"state"] + datastoreName <- paste(datastorePrefix, dat[1, "state"], sep = "_") + dat$state <- NULL + ore.load(name = xlevelsDatastore) + for (j in names(xlevels)) + dat[[j]] <- factor(dat[[j]], levels = xlevels[[j]]) + c5mod <- C5.0(churn ~ ., data = dat, rules = TRUE) + ore.save(c5mod, name = datastoreName) + TRUE + }) R> R> ds.v <- ore.datastore(pattern="myC5.0modelFL")$datastore.name R> for (ds in ds.v) ore.delete(name=ds) R> R> res <- ore.groupApply(CHURN_TEST, + INDEX=CHURN_TEST$state, + FUN.NAME="myC5.0FunctionForLevels", + xlevelsDatastore = "myXLevels", + datastorePrefix = "myC5.0modelFL", + ore.connect = TRUE) R> res <- ore.pull(res) R> all(as.logical(res) == TRUE) [1] TRUE R> R> ore.scriptDrop("myScoringFunction") R> ore.scriptCreate("myScoringFunction", + function(dat, xlevelsDatastore, datastorePrefix) { + library(C50) + state <- dat[1,"state"] + datastoreName <- paste(datastorePrefix,state,sep="_") + dat$state <- NULL + ore.load(name = xlevelsDatastore) + for (j in names(xlevels)) + dat[[j]] <- factor(dat[[j]], levels = xlevels[[j]]) + ore.load(name = datastoreName) + res <- data.frame(pred = predict(c5mod, dat, type="class"), + actual = dat$churn, + state = state) + res + } + ) R> R> scores <- ore.rowApply( + CHURN_TEST[CHURN_TEST$state =="MA",], + FUN.NAME = "myScoringFunction", + xlevelsDatastore = "myXLevels", + datastorePrefix = "myC5.0modelFL", + ore.connect = TRUE, parallel = TRUE, + FUN.VALUE = data.frame(pred=character(0), + actual=character(0), + state=character(0)), + rows=200 R> R> scores pred actual state 1 no no MA 2 no no MA 3 no no MA 4 no no MA 5 no no MA 6 no yes MA 7 yes yes MA 8 yes yes MA 9 no no MA 10 no no MA 11 no no MA 12 no no MA 13 no no MA 14 no no MA 15 yes yes MA 16 no no MA 17 no no MA 18 no no MA 19 no no MA 20 no no MA 21 no no MA 22 no no MA 23 no no MA 24 no no MA 25 no no MA 26 no no MA 27 no no MA 28 no no MA 29 no yes MA 30 no no MA 31 no no MA 32 no no MA 33 yes yes MA 34 no no MA 35 no no MA 36 no no MA 37 no no MA 38 no no MA Warning message: ORE object has no unique key - using random order R> table(scores$actual, scores$pred) no yes no 32 0 yes 2 4