6 Use Oracle Machine Learning for R Embedded R Execution
Embedded R execution in OML4R enables you to invoke R scripts in R sessions that run on the Oracle Database server.
These topics discuss embedded R execution:
- About Oracle Machine Learning for R Embedded R Execution
In OML4R, embedded R execution is the ability to run R scripts in R engines that are dynamically started and managed by the database. - R Interface for Embedded R Execution
Oracle Machine Learning for R provides functions that invoke R scripts that run in one or more R engines that are embedded in the Oracle database. - SQL Interface for Embedded R Execution
The SQL interface for Oracle Machine Learning for R embedded R execution allows you to execute R functions in production database applications.
6.1 About Oracle Machine Learning for R Embedded R Execution
In OML4R, embedded R execution is the ability to run R scripts in R engines that are dynamically started and managed by the database.
You can store R scripts in the OML4R script repository and to invoke such scripts with embedded R functions. When invoked, a script executes in one or more R engines that run on the database server. OML4R provides both an R interface and a SQL interface for embedded R execution. From the same R script you can get structured data, an XML representation of R objects and images, and even PNG images through a BLOB column in a database table.
The following topics describe embedded R execution:
- Benefits of Embedded R Execution
Embedded R execution has the following benefits: - APIs for Embedded R Execution
Oracle Machine Learning for R provides R and SQL application programming interfaces for embedded R execution. - Security for Scripts
Because R scripts allow access to the database server, the creation of scripts must be controlled. - Support for Parallel Execution
Some of the Oracle Machine Learning for R embedded R execution functions support the use of parallel execution in the database. - Install a Third-Party Package for Use in Embedded R Execution
Embedded R execution allows the use of CRAN or other third-party packages in user-defined R functions executed on the Oracle Database server.
Parent topic: Use Oracle Machine Learning for R Embedded R Execution
6.1.1 Benefits of Embedded R Execution
Embedded R execution has the following benefits:
-
Eliminates moving data from the Oracle Database server to your local R session.
As well as being more secure, the transfer of database data between Oracle Database and an internal R engine is much faster than to a separate client R engine.
-
Uses the database server to start, manage, and control the execution of R scripts in R engines running on the server.
-
Leverages the memory and processing power of the database server machine for R engine execution, which provides better scalability and performance.
-
Enables data-parallel and task-parallel execution of user-defined R functions that correspond to special cases of Hadoop Map-Reduce jobs.
-
Provides parallel simulations capability.
-
Allows the use of open source CRAN packages in R scripts running on the database server.
-
Provides the ability to develop and operationalize comprehensive scripts for analytical applications in a single step, without leaving the R environment.
You can directly integrate R scripts used in exploratory analysis into application tasks. You can also immediately invoke R scripts in production to drastically reduce time to market by eliminating porting and enabling instantaneous updates of changes to application code.
-
Executing R scripts from SQL enables integration of R script results with Oracle Business Intelligence Enterprise Edition (OBIEE), Oracle BI Publisher, and other SQL-enabled tools for structured data, R objects, and images.
Parent topic: About Oracle Machine Learning for R Embedded R Execution
6.1.2 APIs for Embedded R Execution
Oracle Machine Learning for R provides R and SQL application programming interfaces for embedded R execution.
The following table lists the R functions and the equivalent SQL functions and procedures for embedded R execution and OML4R script repository management. The function f refers to a named R function or an R function defined in a script in the OML4R script repository.
Table 6-1 R and SQL APIs for Embedded R Execution
R API | SQL API | Description |
---|---|---|
|
|
Executes f with no automatic transfer of data. |
|
|
Executes f by passing all rows of the provided input |
|
This function must be explicitly defined by the user. |
Executes f by partitioning data according to the values of a grouping column. Provides each data partition as a |
|
|
Executes f by passing a specified number of rows (a chunk) of the provided input |
|
No equivalent. |
Executes f with no automatic transfer of data but provides the index of the invocation, 1 through n, where n is the number of times to invoke the function. Supports parallel execution of each f invocation in the pool of R engines running on the database server. |
ore.grant |
rqGrant |
Grants read privilege access to a datastore or script. |
ore.revoke |
rqRevoke |
Revokes read privilege access to a datastore or script. |
|
|
Adds the provided R function into the OML4R script repository with the provided name. |
|
|
Removes the named R function from the OML4R script repository. |
ore.scriptList |
ALL_RQ_SCRIPTS
|
Lists information about scripts. |
ore.scriptLoad |
No equivalent. | Loads the R function of a script into the R environment. |
6.1.3 Security for Scripts
Because R scripts allow access to the database server, the creation of scripts must be controlled.
The RQADMIN role is a collection of Oracle Database privileges that a user must have to create scripts and store them in the Oracle Machine Learning for R script repository or drop scripts from the repository.
The installation of OML4R on the database server creates the RQADMIN role. The role must be explicitly granted to a user. To grant RQADMIN to a user, start SQL*Plus as sysdba
and enter a GRANT
statement such as the following, which grants the role to the user OML_USER:
GRANT RQADMIN to OML_USER
Note:
You should grant RQADMIN only to those users who need it.
When creating a script, the owner can use the global
argument to specify whether the script is public or private. If global = TRUE
, then all users have read privilege access to the script. If global = FALSE
, which is the default, then the owner can share the script by granting access to other users. The owner can revoke the access at any time.
See Also:
Parent topic: About Oracle Machine Learning for R Embedded R Execution
6.1.4 Support for Parallel Execution
Some of the Oracle Machine Learning for R embedded R execution functions support the use of parallel execution in the database.
The ore.groupApply
, ore.rowApply
, rq.groupEval, and rq.rowEval
functions support data-parallel execution and the ore.indexApply
function supports task-parallel execution. This parallel execution capability enables a script to take advantage of high-performance computing hardware such as an Oracle Exadata Database Machine.
The parallel
argument of the ore.groupApply
, ore.rowApply
, and ore.indexApply
functions specifies the degree of parallelism to use in the embedded R execution. The value of the argument can be one of the following:
-
A positive integer greater than or equal to
2
for a specific degree of parallelism -
FALSE
or1
for no parallelism -
TRUE
for the default parallelism of thedata
argument -
NULL
for the database default for the operation
The default value of the argument is the value of the global option ore.parallel
or FALSE
if ore.parallel
is not set.
A user-defined R function invoked using ore.doEval
or ore.tableApply
is not executed in parallel. The function executes in a single R engine.
For the rq.groupEval, and rq.rowEval
functions, the degree of parallelism is specified by a PARALLEL
hint in the input cursor argument.
In data-parallel execution for the ore.groupApply
and rq.groupEval functions, one or more R engines perform the same R function, or task, on different partitions of data. This functionality enables the building of large numbers of models, for example building tens or hundreds of thousands of predictive models, one model per customer.
In data-parallel execution for the ore.rowApply
and rq.rowEval
functions, one or more R engines perform the same R function on disjoint chunks of data. This functionality enables scalable model scoring and predictions on large data sets.
In task-parallel execution for the ore.indexApply
function, one or more R engines perform the same or different calculations, or task. A number, associated with the index of the execution, is provided to the function. This functionality is valuable in a variety of operations, such as in performing simulations.
Oracle Database handles the management and control of potentially multiple R engines at the database server, automatically partitioning and passing data to R engines executing in parallel. It ensures that all of the R function executions for all of the partitions complete; if not, the OML4R function returns an error. The result from the execution of each user-defined embedded R function is gathered in an ore.list
. This list remains in the database until the user requires the result.
Embedded R execution also allows for data-parallel execution of user-defined R functions that may use functions from an open source R package from The Comprehensive R Archive Network (CRAN) or other third-party R package. However, third-party packages do not leverage in-database parallelism and are subject to the parallelism constraints of R. Third-party packages can benefit from the data-parallel and task-parallel execution supported in embedded R execution.
Parent topic: About Oracle Machine Learning for R Embedded R Execution
6.1.5 Install a Third-Party Package for Use in Embedded R Execution
Embedded R execution allows the use of CRAN or other third-party packages in user-defined R functions executed on the Oracle Database server.
To use a third-party package in embedded R execution, the package must be installed on the database server. If you are going to use the package from the R interface for embedded R execution, then the package must also be installed on the client, as well. To avoid incompatibilities, you must install the same version of the package on both the client and server machines.
An Oracle Database Administrator (DBA) can install a package on a database server so that it can be used by embedded R execution functions or by any R user. The DBA can install a package on a single database server or on multiple database servers.
A DBA would typically do the following:
- Download and install the package from CRAN. Downloading a package from CRAN requires an Internet connection.
- In an Oracle Machine Learning for R session running on the server, load the package. Verify that the package is installed correctly by using a function in the package.
To install a package on a single database server, do one of the following:
-
In an OML4R session running on the server, invoke the
install.packages
function, as shown in Example 6-1. The function downloads the package and installs dependencies automatically. -
Download the package source from CRAN using
wget
. If the package depends on any packages that are not in the R distribution in use, then download those packages, also.From the operating system command line, use the
ORE CMD INSTALL
command to install the package or packages in the same location as the OML4R packages, which is$ORACLE_HOME/R/library
. See Example 6-2.
To install a package, and any dependent packages, on multiple database servers, such as those in an Oracle Real Application Clusters (Oracle RAC) or a multinode Oracle Exadata Database Machine environment, use the Exadata Distributed Command Line Interface (DCLI) utility, as shown in Example 6-3.
To verify that the package is installed correctly, load the package and use a function in the package, as shown in Example 6-4.
Example 6-1 Installing a Package for a Single Database in an OML4R Session
This example invokes the install.packages
function to download the C50
package from CRAN and to install it. The C50
package contains functions for creating C5.0 decision trees and rule-based models for pattern recognition.
The output this example, which is not shown, is almost identical to the output of the ORE CMD INSTALL
command in Example 6-2.
install.packages("c50")
Example 6-2 Installing a Package for a Single Database from the Command Line
This example demonstrates downloading the C50
package from CRAN and installing it with ORE CMD INSTALL
from a Linux command line.
wget http://cran.r-project.org/src/contrib/C50_0.1.0-19.tar.gz ORE CMD INSTALL C50_0.1.0-19.tar.gz
Listing for This Example
$ wget http://cran.r-project.org/src/contrib/C50_0.1.0-19.tar.gz # The output of wget is not shown. $ ORE CMD INSTALL C50_0.1.0-19.tar.gz * installing to library '/example/dbhome_1/R/library' * installing *source* package 'C50' ... ** package 'C50' successfully unpacked and MD5 sums checked checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed configure: creating ./config.status config.status: creating src/Makevars ** libs gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c attwinnow.c -o attwinnow.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c classify.c -o classify.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c confmat.c -o confmat.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c construct.c -o construct.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c contin.c -o contin.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c discr.c -o discr.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c formrules.c -o formrules.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c formtree.c -o formtree.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c getdata.c -o getdata.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c getnames.c -o getnames.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c global.c -o global.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c hash.c -o hash.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c hooks.c -o hooks.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c implicitatt.c -o implicitatt.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c info.c -o info.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c mcost.c -o mcost.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c modelfiles.c -o modelfiles.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c p-thresh.c -o p-thresh.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c prune.c -o prune.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c rc50.c -o rc50.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c redefine.c -o redefine.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c rsample.c -o rsample.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c rulebasedmodels.c -o rulebasedmodels.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c rules.c -o rules.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c ruletree.c -o ruletree.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c siftrules.c -o siftrules.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c sort.c -o sort.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c strbuf.c -o strbuf.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c subset.c -o subset.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c top.c -o top.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c trees.c -o trees.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c update.c -o update.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c utility.c -o utility.o gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -DNDEBUG -I/usr/local/include -ffloat-store -g -fpic -g -O2 -c xval.c -o xval.o gcc -m64 -std=gnu99 -shared -L/usr/local/lib64 -o C50.so attwinnow.o classify.o confmat.o construct.o contin.o discr.o formrules.o formtree.o getdata.o getnames.o global.o hash.o hooks.o implicitatt.o info.o mcost.o modelfiles.o p-thresh.o prune.o rc50.o redefine.o rsample.o rulebasedmodels.o rules.o ruletree.o siftrules.o sort.o strbuf.o subset.o top.o trees.o update.o utility.o xval.o -L/usr/lib64/R/lib -lR installing to /example/dbhome_1/R/library/C50/libs ** R ** data ** preparing package for lazy loading ** help *** installing help indices converting help for package 'C50' finding HTML links ... done C5.0 html C5.0Control html churn html predict.C5.0 html summary.C5.0 html varImp.C5.0 html ** building package indices ** testing if installed package can be loaded * DONE (C50)
Example 6-3 Installing a Package Using DCLI
This example shows the DLCI command for installing the C50
package. The dcli -g
flag designates a file containing a list of nodes to install on, and the -l
flag specifies the user ID to use when executing the commands.
dcli -g nodes -l oracle R CMD INSTALL C50_0.1.0-19.tar.gz
Example 6-4 Using a C50 Package Function
This example shows starting R, connecting to OML4R on the server, loading the C50
package, and using a function in the package. The example starts R by executing the ORE
command from the Linux command line. The example connects to OML4R and then loads the C50
package. It invokes the demo
function to look for example programs in the package. Because the package does not have examples, this example then gets help for the C5.0
function. The example invokes example code from the help.
ORE library(ORE) ore.connect(user = "OML_USER", sid = "orcl", host = "myhost", password = "oml_userStrongPassword", port = 1521, all=TRUE) library(C50) demo(package = "C50") ?C5.0 data(churn) treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn) treeModel
Listing for This Example
$ ORE R> library(ORE) Loading required package: OREbase Attaching package: 'OREbase' The following objects are masked from 'package:base': cbind, data.frame, eval, interaction, order, paste, pmax, pmin, rbind, table Loading required package: OREembed Loading required package: OREstats Loading required package: MASS Loading required package: OREgraphics Loading required package: OREeda Loading required package: OREmodels Loading required package: OREdm Loading required package: lattice Loading required package: OREpredict Loading required package: ORExml > ore.connect(user = "OML_USER", sid = "orcl", host = "myhost", + password = "oml_userStrongPassword", port = 1521, all=TRUE) Loading required package: ROracle Loading required package: DBI R> library(C50) R> demo(package = "C50") no demos found R> ?C5.0 # Output not shown. R> data(churn) R> treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn) R> treeModel Call: C5.0.default(x = churnTrain[, -20], y = churnTrain$churn) Classification Tree Number of samples: 3333 Number of predictors: 19 Tree size: 27 Non-standard options: attempt to group attributes
See Also:
-
Using DCLI to Install Oracle R Enterprise on Exadata in Oracle Machine Learning for R Installation and Administration Guide
Parent topic: About Oracle Machine Learning for R Embedded R Execution
6.2 R Interface for Embedded R Execution
Oracle Machine Learning for R provides functions that invoke R scripts that run in one or more R engines that are embedded in the Oracle database.
Other functions create and store an R function as a script in the OML4R script repository, grant or revoke read access to a script, list the available scripts, load a script function into the R environment, or drop a script from the repository. This section describes these functions in the following topics:
- Arguments for Functions that Run Scripts
The Oracle Machine Learning for R embedded R execution functionsore.doEval
,ore.tableApply
,ore.groupApply
,ore.rowApply
, andore.indexApply
have arguments that are common to some or all of the functions. - Manage Scripts in R
Embedded R execution functions can invoke R functions that are stored as scripts in the OML4R script repository. You can use the R functions described in this topic to create and manage scripts. - Use the ore.doEval Function
Theore.doEval
function executes the specified input function using data that is generated by the input function. - Use the ore.tableApply Function
Theore.tableApply
function invokes an R script with anore.frame
as the input data. - Use the ore.groupApply Function
Theore.groupApply
function invokes an R script with anore.frame
as the input data. - Use the ore.rowApply Function
Theore.rowApply
function invokes an R script with anore.frame
as the input data. - Use the ore.indexApply Function
Theore.indexApply
function executes the specified user-defined input function using data that is generated by the input function.
Parent topic: Use Oracle Machine Learning for R Embedded R Execution
6.2.1 Arguments for Functions that Run Scripts
The Oracle Machine Learning for R embedded R execution functions ore.doEval
, ore.tableApply
, ore.groupApply
, ore.rowApply
, and ore.indexApply
have arguments that are common to some or all of the functions.
Some of the functions also have an argument that is unique to the function. The following topics describe these arguments:
- Input Function to Execute
The embedded R execution functions all require an R function to apply during the execution of the script. - Optional and Control Arguments
All of the embedded R execution functions take optional arguments, which can be named or not. - Structure of Return Value
Another argument that applies to all of the embedded R execution functions isFUN.VALUE
. - Input Data
Theore.doEval
andore.indexApply
functions do not automatically receive any data from the database. - Parallel Execution
Theparallel
argument specifies the degree of parallelism to use in the embedded R execution of the input function. - Unique Arguments
Theore.groupApply
,ore.indexApply
, andore.rowApply
functions each take an argument that is unique to the function.
See Also:
-
For function signatures and more details about function arguments, see the online help displayed by invoking
help(ore.doEval)
-
For examples of the use of the arguments, see Using the ore.doEval Function and the other topics on using the embedded R execution functions
Parent topic: R Interface for Embedded R Execution
6.2.1.1 Input Function to Execute
The embedded R execution functions all require an R function to apply during the execution of the script.
You specify the input function with one of the following mutually exclusive arguments:
-
FUN
-
FUN.NAME
(and optionalFUN.OWNER
)
The FUN
argument takes a function object as a directly specified function or as one assigned to an R variable. Only a user with the RQADMIN role can use the FUN
argument when invoking an embedded R function.
The FUN.NAME
argument specifies a script that is stored in the OML4R R script repository. A stored script contains the function to apply when the script runs. Any OML4R user can use the FUN.NAME
argument when invoking an embedded R function.
The optional argument FUN.OWNER
specifies the owner of a script in the R script repository. The owner is the user who created the script. Use this argument only with the FUN.NAME
argument. When FUN.NAME
is a private script to which you have been granted read privilege access, use FUN.OWNNER
to specify the owner of the private script.
The RQSYS schema is the owner of public scripts and the predefined OML4R scripts. For a list of the predefined scripts, invoke help(“ore.doEval”)
and see the description of the FUN.NAME
argument. If FUN.OWNNER
is not specified or is NULL
, then OML4R looks for the owner in the following order: user of the current session, RQSYS. If the owner of the script is not current user or RQSYS, then an error occurs.
Note:
The OML4R functions in the OREmodels
package, ore.glm
, ore.lm
, ore.neural
, and ore.randomForest
, use the embedded R execution framework internally and cannot be used in embedded R execution functions.
Parent topic: Arguments for Functions that Run Scripts
6.2.1.2 Optional and Control Arguments
All of the embedded R execution functions take optional arguments, which can be named or not.
Oracle Machine Learning for R passes user-defined optional arguments to the input function. You can pass any number of optional arguments to the input function, including complex R objects such as models.
Arguments that start with ore.
are special control arguments. OML4R does not pass them to the input function, but instead uses them to control what happens before or after the execution of that function. The following control arguments are supported:
-
ore.connect
controls whether to automatically connect to OML4R inside the embedded R execution function. This is equivalent to doing anore.connect
call with the same credentials as the client session. The default value isFALSE
.If an automatic connection is enabled, the following functionality occurs:
-
The embedded R script is connected to the database.
-
The connection has the same credentials as the session that invokes the embedded R SQL function.
-
The script runs in an autonomous transaction.
-
ROracle queries can work with the automatic connection.
-
OML4R transparency layer functionality is enabled in the embedded script.
-
-
ore.drop
controls the input data. If the option value isTRUE
, a one columndata.frame
is converted to avector
. The default value isTRUE
. -
ore.envAsEmptyenv
controls whether an environment referenced in an object is replaced with an empty environment during serialization. Some types of input parameters and returned objects, such aslist
andformula
, are serialized before being saved to the database. If the control argument value isTRUE
, then the referenced environment in the object is replaced with an empty environment whose parent is.GlobalEnv
and the objects in the original referenced environment are not serialized. In some cases, this can significantly reduce the size of serialized objects. If the control argument value isFALSE
, then all of the objects in the referenced environment are serialized and can be unserialized and recovered later. The default value is regulated by the global optionore.envAsEmptyenv
. -
ore.na.omit
controls the handling of missing values in the input data. If you specifyore.na.omit = TRUE
, then rows or vector elements, depending on theore.drop
setting, that contain missing values are removed from the input data. If all of the rows in a chunk contain missing values, then the input data for that chunk will be an emptydata.frame
orvector
. The default value isFALSE
. -
ore.graphics
controls whether to start a graphical driver and look for images. The default value isTRUE
. -
ore.png.*
specifies additional arguments for thepng
graphics driver ifore.graphics
isTRUE
. The naming convention for these arguments is to add anore.png.
prefix to the arguments of thepng
function. For example, ifore.png.height
is supplied, argumentheight
is passed to thepng
function. If not set, the standard default values for thepng
function are used.
See Also:
For more details about control arguments, see the online help displayed by invoking help(ore.doEval)
Parent topic: Arguments for Functions that Run Scripts
6.2.1.3 Structure of Return Value
Another argument that applies to all of the embedded R execution functions is FUN.VALUE
.
If the FUN.VALUE
argument is NULL
, then the ore.doEval
and ore.tableApply
function can return a serialized R object as an ore.object
class object, and the ore.groupApply
, ore.indexApply
, and ore.rowApply
functions return an ore.list
object. However, if you specify a data.frame
or an ore.frame
with the FUN.VALUE
argument, then the function returns an ore.frame
that has the structure of the specified data.frame
or ore.frame
object.
To specify that the corresponding output column of an ore.frame
have a CLOB or BLOB database data type, you can apply the attribute ora.type
to a column of a FUN.VALUE
data.frame
. For an example of using ora.type
, see Example 6-11.
Parent topic: Arguments for Functions that Run Scripts
6.2.1.4 Input Data
The ore.doEval
and ore.indexApply
functions do not automatically receive any data from the database.
They simply execute the function specified by the FUN
or FUN.NAME
argument. Any data needed by the input function is either generated within that function or explicitly retrieved from a data source such as Oracle Database, other databases, or flat files. The input function can load data from a file or a table using the ore.pull
function or other transparency layer function.
The ore.tableApply
, ore.groupApply
, and ore.rowApply
functions require a database table as input data. The table is represented by an ore.frame
. You supply that data with an ore.frame
object that you specify with the X
argument, which is the first argument to the embedded R execution function. The embedded R execution function passes the ore.frame
object to the user-defined input function as the first argument to that function.
Note:
The data represented by the ore.frame
object passed to the user-defined R function is copied from Oracle Database to the database server R engine. The R memory limitations apply. If your database server machine has 32 GB RAM and your data table is 64 GB, then Oracle R Enterprise cannot load the data into the R engine memory.
Parent topic: Arguments for Functions that Run Scripts
6.2.1.5 Parallel Execution
The parallel
argument specifies the degree of parallelism to use in the embedded R execution of the input function.
The parallel
argument is accepted by the ore.groupApply
, ore.indexApply
, and ore.rowApply
functions.
Parent topic: Arguments for Functions that Run Scripts
6.2.1.6 Unique Arguments
The ore.groupApply
, ore.indexApply
, and ore.rowApply
functions each take an argument that is unique to the function.
The ore.groupApply
function takes the INDEX
argument, which specifies the name of a column by which the rows of the input data are partitioned for processing by the input function.
The ore.indexApply
function takes the times
argument, which specifies the number of times to execute the input function.
The ore.rowApply
function tales the rows
argument, which specifies the number of rows to pass to each invocation of the input function.
Parent topic: Arguments for Functions that Run Scripts
6.2.2 Manage Scripts in R
Embedded R execution functions can invoke R functions that are stored as scripts in the OML4R script repository. You can use the R functions described in this topic to create and manage scripts.
The embedded R execution functions can take a FUN.NAME
argument, which specifies the name of a script in the OML4R script repository. Scripts in the R script repository are also available through the SQL API for embedded R execution.
The R functions for managing scripts are the following:
-
ore.grant
-
ore.revoke
-
ore.scriptCreate
-
ore.scriptList
-
ore.scriptLoad
-
ore.scriptDrop
These functions are described in the following sections:
For an example that uses these functions, see Example 6-5.
Adding a Script
To add an R function as a script in the OML4R script repository, use the ore.createScript
function. To evoke this function, you must have the RQADMIN role. The ore.createScript
function has the following syntax:
ore.scriptCreate(name, FUN, global, overwrite)
The arguments are the following:
Argument | Description |
---|---|
name |
A name for the script in the OML4R script repository. |
fun |
An R function. |
global |
A logical value that indicates whether the script is public (global) or private. FALSE (the default) specifies that the script is not public and is visible only to the owner or to users to whom the owner has granted read privilege access; TRUE specifies that the script is public and therefore visible to all users.
|
overwrite |
A logical value that indicates whether to replace the R function of the script with the function specified in by the fun argument. TRUE specifies replacing the function, if it exists; FALSE (the default) specifies that the existing contents cannot be replaced.
|
If overwrite = FALSE
, an error condition occurs if a script by the same name already exists in the OML4R script repository; otherwise, ore.scriptCreate
returns NULL
.
Granting or Revoking Read Access to a Script
The creator of a script can use the ore.grant
function to grant read access privilege to the script and the ore.revoke
function to revoke that access. Those functions have the following syntax:
ore.grant(name, type = "rqscript", user)
ore.revoke(name, type = "rqscript", user)
The arguments are the following:
Argument | Description |
---|---|
name |
The name of a script in the OML4R script repository. |
type |
For a script, the type is rqscript. |
user |
The user to whom to grant or revoke read privilege access. |
The name
and type
arguments are required. If argument user
is not specified, then read privilege access is granted to or revoked from all users.
An error occurs when one of the following is true:
-
The named script is not in the OML4R script repository.
-
The
type
argument is not specified. -
The user is not found.
-
The read privilege has already been granted to or revoked from the user.
-
The named script is public.
Listing the Available Scripts
To list the scripts available to you, use ore.scriptList
. You can list scripts by name, by a pattern, or by type. If you have the RQADMIN role, you can list system scripts, as well. The function has the following syntax:
ore.scriptList(name, pattern, type)
The arguments are the following:
Argument | Description |
---|---|
name |
The name of a script in the OML4R script repository. Cannot be used when argument pattern is specified.
|
pattern |
A regular expression pattern. Scripts that match the pattern are listed. Cannot be used when argument name is specified.
|
type |
The type of the script, which can be one of the following:
|
The ore.scriptList
function returns a data.frame
that contains the names of the scripts in the OML4R script repository and the function in the script.
Loading a Script into an R Environment
To load the R function of a script into an R environment, use ore.scriptLoad
, which has the following syntax:
ore.scriptLoad(name, owner, newname, envir)
The arguments are the following:
Argument | Description |
---|---|
name |
The name of a script in the OML4R script repository. |
owner |
The owner of the script. |
newname |
A new function name in which to load the script. |
envir |
The R environment in which to load the script. |
Specifying the owner of a script is useful when access to the script has been granted to the user who is invoking ore.scriptLoad
.
Specifying a new function name is useful when the name of the script in the OML4R script repository is not a valid R function name.
An error occurs when one of the following is true:
-
The script is not in the OML4R script repository.
-
The current user does not have read access to the script.
-
The function specified by the
name
argument is not a valid R function name. -
The
newname
argument is not a valid R function name.
Dropping a Script
To remove a script from the OML4R script repository, use the ore.scriptDrop
function. To invoke this function, you must have the RQADMIN role. The ore.scriptDrop
function has the following syntax:
ore.scriptDrop(name, global, silent)
The arguments are the following:
Argument | Description |
---|---|
name |
A name for the script in the OML4R script repository. |
global |
A logical value that indicates whether the script is global (public) or private. TRUE specifies dropping a global script; FALSE (the default) specifies dropping a script owned by the current user.
|
silent |
A logical value that indicates whether to display an error message if ore.scriptDrop encounters an error condition. TRUE specifies the display of error messages; FALSE (the default) specifies no display.
|
An error condition occurs when one of the following is true:
-
The script is not in the OML4R script repository.
-
If
global = TRUE
, the script is a private script. -
If
global = FALSE
, the script is a public script.
If successful, ore.scriptDrop
returns NULL
.
Example 6-5 Using the R Script Management Functions
# Create an ore.frame object from the data.frame for the iris data set. IRIS <- ore.push(iris) # Create a private R script for the current user. ore.scriptCreate("myRandomRedDots", function(divisor = 100){ id <- 1:10 plot(1:100, rnorm(100), pch = 21, bg = "red", cex = 2 ) data.frame(id = id, val = id / divisor) }) # Create another private R script. ore.scriptCreate("MYLM", function(data, formula, ...) lm(formula, data, ...)) # Create a public script, available to any user. ore.scriptCreate("GLBGLM", function(data, formula, ...) glm(formula = formula, data = data, ...), global = TRUE) # List only my private scripts. ore.scriptList() # List my private scripts and the public scripts. ore.scriptList(type = "all") # List my private scripts that have the specified pattern. ore.scriptList(pattern = "MY") # Grant read access to a private script to all users. ore.grant("MYLM", type = "rqscript") # Grant read access to a private script to a specific user. ore.grant("myRandomRedDots", user = "SCOTT", type = "rqscript") # List the granted scripts. ore.scriptList(type = "grant") # Use the MYLM script in an embedded R execution function. ore.tableApply(IRIS[1:4], FUN.NAME = "MYLM", formula = Sepal.Length ~ .) # Use the GLBGLM script in an embedded R execution function. ore.tableApply(IRIS[1:4], FUN.NAME = "GLBGLM", formula = Sepal.Length ~ .) # Load an R script to an R function object ore.scriptLoad(name = "MYLM") # Invoke the function. MYLM(iris, formula = Sepal.Length ~ .) # Load another R script to an R function object ore.scriptLoad(name = "GLBGLM", newname = "MYGLM") # Invoke the function. MYGLM(iris, formula = Sepal.Length ~ .) # Drop some scripts. ore.scriptDrop("MYLM") ore.scriptDrop("GLBGLM", global = TRUE) # List all scripts. ore.scriptList(type = "all")
Listing for This Example
R> # Create an ore.frame object from the data.frame for the iris data set. R> IRIS <- ore.push(iris) R> R> # Create a private R script for the current user. R> ore.scriptCreate("myRandomRedDots", function(divisor = 100){ + id <- 1:10 + plot(1:100, rnorm(100), pch = 21, bg = "red", cex = 2 ) + data.frame(id = id, val = id / divisor) + }) R> R> # Create another private R script. R> ore.scriptCreate("MYLM", + function(data, formula, ...) lm(formula, data, ...)) R> R> # Create a public script, available to any user. R> ore.scriptCreate("GLBGLM", + function(data, formula, ...) + glm(formula = formula, data = data, ...), + global = TRUE) R> R> # List only my private scripts. R> ore.scriptList() NAME SCRIPT 1 MYLM function (data, formula, ...) \nlm(formula, data, ...) 2 myRandomRedDots function (divisor = 100) \n{\n id & lt\n -1:10\n plot(1:100, rnorm(100), pch = 21, bg = "red", cex = 2)\n data.frame(id = id, val = id/divisor)\n} R> R> # List my private scripts and the public scripts. R> ore.scriptList(type = "all") OWNER NAME SCRIPT 1 RQSYS GLBGLM function (data, formula, ...) \nglm(formula = formula, data = data, ...) 2 OML_USER MYLM function (data, formula, ...) \nlm(formula, data, ...) 3 OML_USER myRandomRedDots function (divisor = 100) \n{\n id & lt\n -1:10\n plot(1:100, rnorm(100), pch = 21, bg = "red", cex = 2)\n data.frame(id = id, val = id/divisor)\n} R> R> # List my private scripts that have the specified pattern. R> ore.scriptList(pattern = "MY") NAME SCRIPT 1 MYLM function (data, formula, ...) \nlm(formula, data, ...) R> R> # Grant read access to a private script to all users. R> ore.grant("MYLM", type = "rqscript") R> R> # Grant read access to a private script to a specific user. R> ore.grant("myRandomRedDots", user = "SCOTT", type = "rqscript") R> R> # List the granted scripts. R> ore.scriptList(type = "grant") NAME GRANTEE 1 MYLM PUBLIC 2 myRandomRedDots SCOTT R> R> # Use the MYLM script in an embedded R execution function. R> ore.tableApply(IRIS[1:4], FUN.NAME = "MYLM", + formula = Sepal.Length ~ .) Call: lm(formula = formula, data = data) Coefficients: (Intercept) Sepal.Width Petal.Length Petal.Width 1.8560 0.6508 0.7091 -0.5565 R> R> # Use the GLBGLM script in an embedded R execution function. R> ore.tableApply(IRIS[1:4], FUN.NAME = "GLBGLM", + formula = Sepal.Length ~ .) Call: glm(formula = formula, data = data) Coefficients: (Intercept) Sepal.Width Petal.Length Petal.Width 1.8560 0.6508 0.7091 -0.5565 Degrees of Freedom: 149 Total (i.e. Null); 146 Residual Null Deviance: 102.2 Residual Deviance: 14.45 AIC: 84.64 R> R> # Load an R script to an R function object R> ore.scriptLoad(name="MYLM") R> R> # Invoke the function. R> MYLM(iris, formula = Sepal.Length ~ .) R> R> # Load another R script to an R function object R> ore.scriptLoad(name = "GLBGLM", newname = "MYGLM") R> R> # Invoke the function. R> MYGLM(iris, formula = Sepal.Length ~ .) R> R> # Drop some scripts. R> ore.scriptDrop("MYLM") R> ore.scriptDrop("GLBGLM", global = TRUE) R> R> # List all scripts. R> ore.scriptList(type = "all") OWNER NAME SCRIPT OML_USER myRandomRedDots function (divisor = 100) \n{\n id & lt\n -1:10\n plot(1:100, rnorm(100), pch = 21, bg = "red", cex = 2)\n data.frame(id = id, val = id/divisor)\n}
See Also:
-
Using the ore.doEval Function for examples that use the
myRandomRedDots
script -
Example 6-14 for another example of using
ore.scriptCreate
andore.scriptDrop
Parent topic: R Interface for Embedded R Execution
6.2.3 Use the ore.doEval Function
The ore.doEval
function executes the specified input function using data that is generated by the input function.
It returns an ore.frame
object or a serialized R object as an ore.object
object.
The syntax of the ore.doEval
function is the following:
ore.doEval(FUN, ..., FUN.VALUE = NULL, FUN.NAME = NULL, FUN.OWNER = NULL)
See Also:
Arguments for Functions that Run Scripts for descriptions of the arguments to function ore.doEval
Example 6-6 Using the ore.doEval Function
In this example, RandomRedDots
gets a function that has an argument and that returns a data.frame
object that has two columns and that plots 100 random normal values. The example then invokes ore.doEval
function and passes it the RandomRedDots
function
object. The image is displayed at the client, but it is generated by the database server R engine that executed the RandomRedDots
function.
RandomRedDots <- function(divisor = 100){ id<- 1:10 plot(1:100, rnorm(100), pch = 21, bg = "red", cex = 2 ) data.frame(id=id, val=id / divisor) } ore.doEval(RandomRedDots)
Listing for This Example
R> RandomRedDots <- function(divisor = 100){ + id<- 1:10 + plot(1:100, rnorm(100), pch = 21, bg = "red", cex = 2 ) + data.frame(id=id, val=id / divisor) + } R> ore.doEval(RandomRedDots) id val 1 1 0.01 2 2 0.02 3 3 0.03 4 4 0.04 5 5 0.05 6 6 0.06 7 7 0.07 8 8 0.08 9 9 0.09 10 10 0.10
Example 6-7 Using the ore.doEval Function with an Optional Argument
You can provide arguments to the input function as optional arguments to the doEval
function. This example invokes the doEval
function with an optional argument that overrides the divisor
argument of the RandomRedDots
function.
ore.doEval(RandomRedDots, divisor = 50)
Listing for This Example
R> ore.doEval(RandomRedDots, divisor = 50) id val 1 1 0.02 2 2 0.04 3 3 0.06 4 4 0.08 5 5 0.10 6 6 0.12 7 7 0.14 8 8 0.16 9 9 0.18 10 10 0.20 # The graph displayed by the plot function is not shown.
Example 6-8 Using the ore.doEval Function with the FUN.NAME Argument
If the input function is stored in the OML4R script repository, then you can invoke the ore.doEval
function with the FUN.NAME
argument. This example first invokes ore.scriptDrop
to ensure that the script repository does not contain a script with the name myRandomRedDots
. The example adds the RandomRedDots
function from Example 6-6 to the repository under the name myRandomRedDots
. This example invokes the ore.doEval
function and specifies myRandomRedDots
. The result is assigned to the variable res
.
The return value of the RandomRedDots
function is a data.frame
but in this example the ore.doEval
function returns an ore.object
object. To get back the data.frame
object, the example invokes ore.pull
to pull the result to the client R session.
ore.scriptDrop("myRandomRedDots") ore.scriptCreate("myRandomRedDots", RandomRedDots) res <- ore.doEval(FUN.NAME = "myRandomRedDots", divisor = 50) class(res) res.local <- ore.pull(res) class(res.local)
Listing for This Example
R> ore.scriptDrop("myRandomRedDots") R> ore.scriptCreate("myRandomRedDots", RandomRedDots) R> res <- ore.doEval(FUN.NAME = "myRandomRedDots", divisor = 50) R> class(res) [1] "ore.object" attr(,"package") [1] "OREembed" R> res.local <- ore.pull(res) R> class(res.local) [1] "data.frame"
Example 6-9 Using the ore.doEval Function with the FUN.VALUE Argument
To have the doEval
function return an ore.frame
object instead of an ore.object
, use the argument FUN.VALUE
to specify the structure of the result, as shown in this example.
res.of <- ore.doEval(FUN.NAME="myRandomRedDots", divisor = 50, FUN.VALUE= data.frame(id = 1, val = 1)) class(res.of)Listing for Example 6-9
R> res.of <- ore.doEval(FUN.NAME="myRandomRedDots", divisor = 50, + FUN.VALUE= data.frame(id = 1, val = 1)) R> class(res.of) [1] "ore.frame" attr(,"package") [1] "OREbase"
Example 6-10 Using the doEval Function with the ore.connect Argument
This example demonstrates using the special optional argument ore.connect
to connect to the database in the embedded R function, which enables the use of objects stored in a datastore. The example creates the RandomRedDots2
function object, which is the same as the RandomRedDots
function from Example 6-6 except the RandomRedDots2
function has an argument that takes the name of a datastore. The example creates the myVar
variable and saves it in the datastore named datastore_1
. The example then invokes the doEval
function and passes it the name of the datastore and passes the ore.connect
control argument set to TRUE
.
RandomRedDots2 <- function(divisor = 100, datastore.name = "myDatastore"){ id <- 1:10 plot(1:100, rnorm(100), pch = 21, bg = "red", cex = 2 ) ore.load(datastore.name) # Contains the numeric variable myVar. data.frame(id = id, val = id / divisor, num = myVar) } myVar <- 5 ore.save(myVar, name = "datastore_1") ore.doEval(RandomRedDots2, datastore.name = "datastore_1", ore.connect = TRUE)
Listing for This Example
R> RandomRedDots2 <- function(divisor = 100, datastore.name = "myDatastore"){ + id <- 1:10 + plot(1:100, rnorm(100), pch = 21, bg = "red", cex = 2 ) + ore.load(datastore.name) # Contains the numeric variable myVar. + data.frame(id = id, val = id / divisor, num = myVar) + } R> ore.doEval(RandomRedDots2, datastore.name = "datastore_1", ore.connect = TRUE) id val num 1 1 0.01 5 2 2 0.02 5 3 3 0.03 5 4 4 0.04 5 5 5 0.05 5 6 6 0.06 5 7 7 0.07 5 8 8 0.08 5 9 9 0.09 5 10 10 0.10 5 # The graph displayed by the plot function is not shown.
Example 6-11 Using the ora.type Attribute
This example demonstrates using the ora.type
attribute to specify database data types of CLOB and BLOB for columns in the data.frame
object specified by the FUN.VALUE
argument.
eval1 <- ore.doEval(function() "Hello, world") eval2 <- ore.doEval(function() data.frame(x = "Hello, world", stringsAsFactors = FALSE)) eval3 <- ore.doEval(function() data.frame(x = "Hello, world", stringsAsFactors = FALSE), FUN.VALUE = data.frame(x = character(), stringsAsFactors = FALSE)) out.df <- data.frame(x = character(), y = raw(), stringsAsFactors = FALSE) attr(out.df$x, "ora.type") <- "clob" attr(out.df$y, "ora.type") <- "blob" eval4 <- ore.doEval(function() { res <- data.frame(x = "Hello, world",stringsAsFactors = FALSE) res$y[[1L]] <- charToRaw("Hello, world") res}, FUN.VALUE = out.df) eval1 class(eval1) # ore.object eval2 class(eval2) # ore.object eval3 class(eval3) # ore.frame eval4$x rawToChar(ore.pull(eval4$y))
Listing for This Example
R> eval1 <- ore.doEval(function() "Hello, world") R> eval2 <- + ore.doEval(function() + data.frame(x = "Hello, world", stringsAsFactors = FALSE)) R> eval3 <- + ore.doEval(function() + data.frame(x = "Hello, world", stringsAsFactors = FALSE), + FUN.VALUE = + data.frame(x = character(), stringsAsFactors = FALSE)) R> out.df <- data.frame(x = character(), y = raw(), stringsAsFactors = FALSE) R> attr(out.df$x, "ora.type") <- "clob" R> attr(out.df$y, "ora.type") <- "blob" R> eval4 <- + ore.doEval(function() { + res <- data.frame(x = "Hello, world",stringsAsFactors = FALSE) + res$y[[1L]] <- charToRaw("Hello, world") + res}, + FUN.VALUE = out.df) R> eval1 [1] "Hello, world" R> class(eval1) [1] "ore.object" attr(,"package") [1] "OREembed" R> eval2 x 1 Hello, world R> class(eval2) [1] "ore.object" attr(,"package") [1] "OREembed" R> eval3 x 1 Hello, world Warning message: ORE object has no unique key - using random order R> class(eval3) [1] "ore.frame" attr(,"package") [1] "OREbase" R> eval4$x [1] "Hello, world" Warning message: ORE object has no unique key - using random order R> rawToChar(ore.pull(eval4$y)) [1] "Hello, world" Warning message: ORE object has no unique key - using random order
Parent topic: R Interface for Embedded R Execution
6.2.4 Use the ore.tableApply Function
The ore.tableApply
function invokes an R script with an ore.frame
as the input data.
The ore.tableApply
function passes the ore.frame
to the user-defined input function as the first argument to that function. The ore.tableApply
function returns an ore.frame
object or a serialized R object as an ore.object
object.
The syntax of the ore.tableApply
function is the following:
ore.tableApply(X, FUN, ..., FUN.VALUE = NULL, FUN.NAME = NULL, FUN.OWNER = NULL)
See Also:
Arguments for Functions that Run Scripts for descriptions of the arguments to function ore.tableApply
Example 6-12 Using the ore.tableApply Function
This example uses the ore.tableApply
function to build a Naive Bayes model on the iris
data set. The naiveBayes
function is in the e1071
package, which must be installed on both the client and database server machine R engines. As the first argument to the ore.tableApply
function, the ore.push(iris)
invocation creates a temporary database table and an ore.frame
that is a proxy for the table. The second argument is the input function, which has as an argument dat
. The ore.tableApply
function passes the ore.frame
table proxy to the input function as the dat
argument. The input function creates a model, which the ore.tableApply
function returns as an ore.object
object.
library(e1071) nbmod <- ore.tableApply( ore.push(iris), function(dat) { library(e1071) dat$Species <- as.factor(dat$Species) naiveBayes(Species ~ ., dat) }) class(nbmod) nbmod
Listing for This Example
R> nbmod <- ore.tableApply( + ore.push(iris), + function(dat) { + library(e1071) + dat$Species <- as.factor(dat$Species) + naiveBayes(Species ~ ., dat) + }) R> class(nbmod) [1] "ore.object" attr(,"package") [1] "OREembed" R> nbmod Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.default(x = X, y = Y, laplace = laplace) A-priori probabilities: Y setosa versicolor virginica 0.3333333 0.3333333 0.3333333 Conditional probabilities: Sepal.Length Y [,1] [,2] setosa 5.006 0.3524897 versicolor 5.936 0.5161711 virginica 6.588 0.6358796 Sepal.Width Y [,1] [,2] setosa 3.428 0.3790644 versicolor 2.770 0.3137983 virginica 2.974 0.3224966 Petal.Length Y [,1] [,2] setosa 1.462 0.1736640 versicolor 4.260 0.4699110 virginica 5.552 0.5518947 Petal.Width Y [,1] [,2] setosa 0.246 0.1053856 versicolor 1.326 0.1977527 virginica 2.026 0.2746501
Parent topic: R Interface for Embedded R Execution
6.2.5 Use the ore.groupApply Function
The ore.groupApply
function invokes an R script with an ore.frame
as the input data.
The ore.groupApply
function passes the ore.frame
to the user-defined input function as the first argument to that function. The INDEX
argument to the ore.groupApply
function specifies the name of a column of the ore.frame
by which Oracle Database partitions the rows for processing by the user-defined R function. The ore.groupApply
function can use data-parallel execution, in which one or more R engines perform the same R function, or task, on different partitions of data.
The syntax of the ore.groupApply
function is the following:
ore.groupApply(X, INDEX, FUN, ..., FUN.VALUE = NULL, FUN.NAME = NULL, FUN.OWNER = NULL, parallel = getOption("ore.parallel", NULL))
The ore.groupApply
function returns an ore.list
object or an ore.frame
object.
Examples of the use of the ore.groupApply
function are in the following topics:
- Partition on a Single Column
This example uses theore.groupApply
function and partitions the data on a single column. - Partition on Multiple Columns
This example uses theore.groupApply
function and partitions the data on multiple columns.
Parent topic: R Interface for Embedded R Execution
6.2.5.1 Partition on a Single Column
This example uses the ore.groupApply
function and partitions the data on a single column.
The example uses the C50
package, which has functions that build decision tree and rule-based models. The package also provides training and testing data sets. The example builds C5.0 models on the churnTrain
training data set from the churn
data set of the C50
package, with the goal of building one churn model on the data for each state. The example does the following:
-
Loads the
C50
package and then thechurn
data set. -
Uses the
ore.create
function to create theCHURN_TRAIN
database table and its proxyore.frame
object fromchurnTrain
, adata.frame
object. -
Specifies
CHURN_TRAIN
, the proxyore.frame
object, as the first argument to theore.groupApply
function and specifies thestate
column as theINDEX
argument. Theore.groupApply
function partitions the data on thestate
column and invokes the user-defined function on each partition. -
Creates the variable
modList
, which gets theore.list
object returned by theore.groupApply
function. Theore.list
object contains the results from the execution of the user-defined function on each partition of the data. In this case, it is one C5.0 model per state, with each model stored as anore.object
object. -
Specifies the user-defined function. The first argument of the user-defined function receives one partition of the data, which in this case is all of the data associated with a single state.
The user-defined function does the following:
-
Loads the C50 package so that it is available to the function when it executes in an R engine in the database.
-
Deletes the
state
column from thedata.frame
so that the column is not included in the model. -
Converts the columns to factors because, although the
ore.frame
defined factors, when they are loaded to the user-defined function, factors appear as character vectors. -
Builds a model for a state and returns it.
-
-
Uses the
ore.pull
function to retrieve the model from the database as themod.MA
variable and then invokes thesummary
function on it. The class ofmod.MA
isC5.0
.
Example 6-13 Using the ore.groupApply Function
library(C50) data("churn") ore.create(churnTrain, "CHURN_TRAIN") modList <- ore.groupApply( CHURN_TRAIN, INDEX=CHURN_TRAIN$state, function(dat) { library(C50) dat$state <- NULL dat$churn <- as.factor(dat$churn) dat$area_code <- as.factor(dat$area_code) dat$international_plan <- as.factor(dat$international_plan) dat$voice_mail_plan <- as.factor(dat$voice_mail_plan) C5.0(churn ~ ., data = dat, rules = TRUE) }); mod.MA <- ore.pull(modList$MA) summary(mod.MA)
Listing for This Example
R> library(C50) R> data(churn) R> R> ore.create(churnTrain, "CHURN_TRAIN") R> R> modList <- ore.groupApply( + CHURN_TRAIN, + INDEX=CHURN_TRAIN$state, + function(dat) { + library(C50) + dat$state <- NULL + dat$churn <- as.factor(dat$churn) + dat$area_code <- as.factor(dat$area_code) + dat$international_plan <- as.factor(dat$international_plan) + dat$voice_mail_plan <- as.factor(dat$voice_mail_plan) + C5.0(churn ~ ., data = dat, rules = TRUE) + }); R> mod.MA <- ore.pull(modList$MA) R> summary(mod.MA) Call: C5.0.formula(formula = churn ~ ., data = dat, rules = TRUE) C5.0 [Release 2.07 GPL Edition] Thu Feb 13 15:09:10 2014 ------------------------------- Class specified by attribute `outcome' Read 65 cases (19 attributes) from undefined.data Rules: Rule 1: (52/1, lift 1.2) international_plan = no total_day_charge <= 43.04 -> class no [0.963] Rule 2: (5, lift 5.1) total_day_charge > 43.04 -> class yes [0.857] Rule 3: (6/1, lift 4.4) area_code in {area_code_408, area_code_415} international_plan = yes -> class yes [0.750] Default class: no Evaluation on training data (65 cases): Rules ---------------- No Errors 3 2( 3.1%) << (a) (b) <-classified as ---- ---- 53 1 (a): class no 1 10 (b): class yes Attribute usage: 89.23% international_plan 87.69% total_day_charge 9.23% area_code Time: 0.0 secs
Parent topic: Use the ore.groupApply Function
6.2.5.2 Partition on Multiple Columns
This example uses the ore.groupApply
function and partitions the data on multiple columns.
The ore.groupApply
function takes a single column or multiple columns as the INDEX
argument. The following example uses data from the CHURN_TRAIN
data set to build an rpart
model that produces rules on the partitions of data specified, which are the voice_mail_plan
and international_plan
columns. The example uses the R table
function to show the number of rows to expect in each partition.
The example invokes the ore.scriptDrop
function to ensure that no script by the specified name exists in the OML4R script repository. It then uses the ore.scriptCreate
function to define a script named my_rpartFunction
and to store it in the repository. The stored script defines a function that takes a data source and a prefix to use for naming OML4R datastore objects. Each invocation of the function my_rpartFunction
receives data from one of the partitions identified by the values in the voice_mail_plan and international_plan columns. Because the source partition columns are constants, the function sets them to NULL
. It converts the character vectors to factors, builds a model to predict churn, and saves it in an appropriately named datastore. The function creates a list to return the specific partition column values, the distribution of churn values, and the model itself.
The example then loads the rpart
library, sets the datastore prefix, and invokes ore.groupApply
using the values from the voice_mail_plan and international_plan columns as the INDEX
argument and my_rpartFunction
as the value of the FUN.NAME
argument to invoke the user-defined function stored in the script repository. The ore.groupApply
function uses an optional argument to pass the datastorePrefix
variable to the user-defined function. It uses the optional argument ore.connect
to connect to the database when executing the user-defined function. The ore.groupApply
function returns an ore.list
object as the variable res
.
The example displays the first entry in the list returned. It then invokes the ore.load
function to load the model for the case where the customer has both the voice mail plan and the international plan.
Example 6-14 Using ore.groupApply for Partitioning Data on Multiple Columns
library(C50) data(churn) ore.drop("CHURN_TRAIN") ore.create(churnTrain, "CHURN_TRAIN") table(CHURN_TRAIN$international_plan, CHURN_TRAIN$voice_mail_plan) options(width = 80) head(CHURN_TRAIN, 3) ore.scriptDrop("my_rpartFunction") ore.scriptCreate("my_rpartFunction", function(dat, datastorePrefix) { library(rpart) vmp <- dat[1, "voice_mail_plan"] ip <- dat[1, "international_plan"] datastoreName <- paste(datastorePrefix, vmp, ip, sep = "_") dat$voice_mail_plan <- NULL dat$international_plan <- NULL dat$state <- as.factor(dat$state) dat$churn <- as.factor(dat$churn) dat$area_code <- as.factor(dat$area_code) mod <- rpart(churn ~ ., data = dat) ore.save(mod, name = datastoreName, overwrite = TRUE) list(voice_mail_plan = vmp, international_plan = ip, churn.table = table(dat$churn), rpart.model = mod) }) library(rpart) datastorePrefix = "my.rpartModel" res <- ore.groupApply(CHURN_TRAIN, INDEX = CHURN_TRAIN[, c("voice_mail_plan", "international_plan")], FUN.NAME = "my_rpartFunction", datastorePrefix = datastorePrefix, ore.connect = TRUE) res[[1]] ore.load(name=paste(datastorePrefix, "yes", "yes", sep = "_")) mod
Listing for This Example
R> library(C50) R> data(churn) R> ore.drop("CHURN_TRAIN") R> ore.create(churnTrain, "CHURN_TRAIN") R> R> table(CHURN_TRAIN$international_plan, CHURN_TRAIN$voice_mail_plan) no yes no 2180 830 yes 231 92 R> R> options(width = 80) R> head(CHURN_TRAIN, 3) state account_length area_code international_plan voice_mail_plan 1 KS 128 area_code_415 no yes 2 OH 107 area_code_415 no yes 3 NJ 137 area_code_415 no no number_vmail_messages total_day_minutes total_day_calls total_day_charge 1 25 265.1 110 45.07 2 26 161.6 123 27.47 3 0 243.4 114 41.38 total_eve_minutes total_eve_calls total_eve_charge total_night_minutes 1 197.4 99 16.78 244.7 2 195.5 103 16.62 254.4 3 121.2 110 10.30 162.6 total_night_calls total_night_charge total_intl_minutes total_intl_calls 1 91 11.01 10.0 3 2 103 11.45 13.7 3 3 104 7.32 12.2 5 total_intl_charge number_customer_service_calls churn 1 2.70 1 no 2 3.70 1 no 3 3.29 0 no Warning messages: 1: ORE object has no unique key - using random order 2: ORE object has no unique key - using random order R> R> ore.scriptDrop("my_rpartFunction") R> ore.scriptCreate("my_rpartFunction", + function(dat, datastorePrefix) { + library(rpart) + vmp <- dat[1, "voice_mail_plan"] + ip <- dat[1, "international_plan"] + datastoreName <- paste(datastorePrefix, vmp, ip, sep = "_") + dat$voice_mail_plan <- NULL + dat$international_plan <- NULL + dat$state <- as.factor(dat$state) + dat$churn <- as.factor(dat$churn) + dat$area_code <- as.factor(dat$area_code) + mod <- rpart(churn ~ ., data = dat) + ore.save(mod, name = datastoreName, overwrite = TRUE) + list(voice_mail_plan = vmp, + international_plan = ip, + churn.table = table(dat$churn), + rpart.model = mod) + }) R> R> library(rpart) R> datastorePrefix = "my.rpartModel" R> R> res <- ore.groupApply(CHURN_TRAIN, + INDEX = CHURN_TRAIN[, c("voice_mail_plan", "international_plan")], + FUN.NAME = "my_rpartFunction", + datastorePrefix = datastorePrefix, + ore.connect = TRUE) R> res[[1]] $voice_mail_plan [1] "no" $international_plan [1] "no" $churn.table no yes 1878 302 $rpart.model n= 2180 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 2180 302 no (0.86146789 0.13853211) 2) total_day_minutes< 263.55 2040 192 no (0.90588235 0.09411765) 4) number_customer_service_calls< 3.5 1876 108 no (0.94243070 0.05756930) 8) total_day_minutes< 223.25 1599 44 no (0.97248280 0.02751720) * 9) total_day_minutes>=223.25 277 64 no (0.76895307 0.23104693) 18) total_eve_minutes< 242.35 210 18 no (0.91428571 0.08571429) * 19) total_eve_minutes>=242.35 67 21 yes (0.31343284 0.68656716) 38) total_night_minutes< 174.2 17 4 no (0.76470588 0.23529412) * 39) total_night_minutes>=174.2 50 8 yes (0.16000000 0.84000000) * 5) number_customer_service_calls>=3.5 164 80 yes (0.48780488 0.51219512) 10) total_day_minutes>=160.2 95 22 no (0.76842105 0.23157895) 20) state=AL,AZ,CA,CO,DC,DE,FL,HI,KS,KY,MA,MD,ME,MI,NC,ND,NE,NH,NM,OK,OR,SC,TN,VA,VT,WY 56 2 no (0.96428571 0.03571429) * 21) state=AK,AR,CT,GA,IA,ID,MN,MO,NJ,NV,NY,OH,RI,TX,UT,WA,WV 39 19 yes (0.48717949 0.51282051) 42) total_day_minutes>=182.3 21 5 no (0.76190476 0.23809524) * 43) total_day_minutes< 182.3 18 3 yes (0.16666667 0.83333333) * 11) total_day_minutes< 160.2 69 7 yes (0.10144928 0.89855072) * 3) total_day_minutes>=263.55 140 30 yes (0.21428571 0.78571429) 6) total_eve_minutes< 167.3 29 7 no (0.75862069 0.24137931) 12) state=AK,AR,AZ,CO,CT,FL,HI,IN,KS,LA,MD,ND,NM,NY,OH,UT,WA,WV 21 0 no (1.00000000 0.00000000) * 13) state=IA,MA,MN,PA,SD,TX,WI 8 1 yes (0.12500000 0.87500000) * 7) total_eve_minutes>=167.3 111 8 yes (0.07207207 0.92792793) * R> ore.load(name = paste(datastorePrefix, "yes", "yes", sep = "_")) [1] "mod" R> mod n= 92 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 92 36 no (0.60869565 0.39130435) 2) total_intl_minutes< 13.1 71 15 no (0.78873239 0.21126761) 4) total_intl_calls>=2.5 60 4 no (0.93333333 0.06666667) 8) state=AK,AR,AZ,CO,CT,DC,DE,FL,GA,HI,ID,IL,IN,KS,MD,MI,MO,MS,MT,NC,ND,NE,NH,NJ,OH,SC,SD,UT,VA,WA,WV,WY 53 0 no (1.00000000 0.00000000) * 9) state=ME,NM,VT,WI 7 3 yes (0.42857143 0.57142857) * 5) total_intl_calls< 2.5 11 0 yes (0.00000000 1.00000000) * 3) total_intl_minutes>=13.1 21 0 yes (0.00000000 1.00000000) *
Parent topic: Use the ore.groupApply Function
6.2.6 Use the ore.rowApply Function
The ore.rowApply
function invokes an R script with an ore.frame
as the input data.
The ore.rowApply
function passes the ore.frame
to the user-defined input function as the first argument to that function. The rows
argument to the ore.rowApply
function specifies the number of rows to pass to each invocation of the user-defined R function. The last chunk or rows may have fewer rows than the number specified. The ore.rowApply
function can use data-parallel execution, in which one or more R engines perform the same R function, or task, on different partitions of data.
The syntax of the ore.rowApply
function is the following:
ore.rowApply(X, FUN, ..., FUN.VALUE = NULL, FUN.NAME = NULL, rows = 1, FUN.OWNER = NULL, parallel = getOption("ore.parallel", NULL))
The ore.rowApply
function returns an ore.list
object or an ore.frame
object.
See Also:
-
Arguments for Functions that Run Scripts for descriptions of the arguments to function
ore.rowApply
Example 6-15 Using the ore.rowApply Function
This example uses the e1071
package, previously downloaded from CRAN. The example does the following:
-
Loads the package
e1071
. -
Pushes the
iris
data set to the database as theIRIS
temporary table andore.frame
object. -
Creates the Naive Bayes model
nbmod
. -
Creates a copy of
IRIS
asIRIS_PRED
and adds the PRED column toIRIS_PRED
to contain the predictions. -
Invokes the
ore.rowApply
function, passing theIRIS
ore.frame
as the data source for user-defined R function and the user-defined R function itself. The user-defined function does the following:-
Loads the package
e1071
so that it is available to the R engine or engines that run in the database. -
Converts the Species column to a factor because, although the
ore.frame
defined factors, when they are loaded to the user-defined function, factors appear as character vectors. -
Invokes the
predict
method and returns theres
object, which contains the predictions in the column added to the data set.
-
-
Pulls the model to the client R session.
-
Passes
IRIS_PRED
as the argumentFUN.VALUE
, which specifies the structure of the object that theore.rowApply
function returns. -
Specifies the number of rows to pass to each invocation of the user-defined function.
-
Displays the class of
res
, and invokes thetable
function to display the Species column and the PRED column of theres
object.
library(e1071) IRIS <- ore.push(iris) nbmod <- ore.tableApply( ore.push(iris), function(dat) { library(e1071) dat$Species <- as.factor(dat$Species) naiveBayes(Species ~ ., dat) }) IRIS_PRED <- IRIS IRIS_PRED$PRED <- "A" res <- ore.rowApply( IRIS, function(dat, nbmod) { library(e1071) dat$Species <- as.factor(dat$Species) dat$PRED <- predict(nbmod, newdata = dat) dat }, nbmod = ore.pull(nbmod), FUN.VALUE = IRIS_PRED, rows = 10) class(res) table(res$Species, res$PRED)
Listing for This Example
R> library(e1071) R> IRIS <- ore.push(iris) R> nbmod <- ore.tableApply( + ore.push(iris), + function(dat) { + library(e1071) + dat$Species <- as.factor(dat$Species) + naiveBayes(Species ~ ., dat) + }) R> IRIS_PRED <- IRIS R> IRIS_PRED$PRED <- "A" R> res <- ore.rowApply( + IRIS , + function(dat, nbmod) { + library(e1071) + dat$Species <- as.factor(dat$Species) + dat$PRED <- predict(nbmod, newdata = dat) + dat + }, + nbmod = ore.pull(nbmod), + FUN.VALUE = IRIS_PRED, + rows = 10) R> class(res) [1] "ore.frame" attr(,"package") [1] "OREbase" R> table(res$Species, res$PRED) setosa versicolor virginica setosa 50 0 0 versicolor 0 47 3 virginica 0 3 47
This example uses the C50
package to score churn data (that is, to predict which customers are likely to churn) using C5.0 models. The example partitions the data by a number of rows. It scores the customers from the specified state in parallel. It uses datastores and saves functions to the OML4R script repository, which allows the functions to be used by the OML4R SQL API functions.
The example first loads C50
package and the data sets. It deletes the datastores with names containing myC5.0modelFL
, if they exist. It invokes ore.drop
to delete the CHURN_TEST table, if it exists, and then invokes ore.create
to create the CHURN_TEST table from the churnTest
data set.
The example next invokes ore.getLevels
, which returns a list
of the levels for each factor column. The invocation excludes the first column, which is state, because the levels for that column are not needed. Getting the levels first can ensure that all possible levels are provided during model building, even if some rows do not have values for some of the levels. The ore.delete
invocation ensures that no datastore with the specified name exists and the ore.save
invocation saves the xlevels
object in the datastore named myXLevels
.
The example creates a user-defined function, myC5.0FunctionForLevels
, that generates a C5.0 model. The function uses the list of levels returned by function ore.getXlevels
instead of computing the levels using the as.factor
function. It uses the levels to convert the column type from character vector to factor. The function myC5.0FunctionForLevels
returns the value TRUE.
The example saves the function in the script repository.
The example next gets a list of datastores that have names that include the specified string and deletes those datastores if they exist.
The example then invokes ore.groupApply
, which invokes function myC5.0FunctionForLevels
on each state in the CHURN_TEST
data. To each myC5.0FunctionForLevels
invocation, ore.groupApply
passes the datastore that contains the xlevels
object and a prefix to use in naming the datastore generated by myC5.0FunctionForLevels
. It also passes the ore.connect
control argument to connect to the database in the embedded R function, which enables the use of objects stored in a datastore. The ore.groupApply
invocation returns a list that contains the results of all of the invocations of myC5.0FunctionForLevels
.
The example pulls the result over to the local R session and verifies that myC5.0FunctionForLevels
returned TRUE
for each state in the data source.
The example next creates another user-defined another function, myScoringFunction
, and stores it in the script repository. The function scores a C5.0 model for the levels of a state and returns the results in a data.frame
.
The example then invokes function ore.rowApply
. It filters the input data to use only data for the state of Massachusetts. It specifies myScoringFunction
as the function to invoke and passes that user-defined function the name of the datastore that contains the xlevels
object and a prefix to use in loading the datastore that contains the C5.0 model for the state. The ore.rowApply
invocation specifies invoking myScoringFunction
on 200 rows of the data set in each parallel R engine. It uses the FUN.VALUE
argument so that ore.rowApply
returns an ore.frame
that contains the results of all of the myScoringFunction
invocations. The variable scores
gets the results of the ore.rowApply
invocation.
Finally, the example prints the scores
object and then uses the table function to display the confusion matrix for the scoring.
See Also:
Example A-8 for an invocation of the SQL rqRowEval
function that produces the same result as the ore.rowApply
function in this example
Example 6-16 Using the ore.rowApply Function with Datastores and Scripts
library(C50) data(churn) ore.drop("CHURN_TEST" ore.create(churnTest, "CHURN_TEST") xlevels <- ore.getXlevels(~ ., CHURN_TEST[,-1]) ore.delete("myXLevels") ore.save(xlevels, name = "myXLevels") ore.scriptDrop("myC5.0FunctionForLevels") ore.scriptCreate("myC5.0FunctionForLevels", function(dat, xlevelsDatastore, datastorePrefix) { library(C50) state <- dat[1,"state"] datastoreName <- paste(datastorePrefix, dat[1, "state"], sep = "_") dat$state <- NULL ore.load(name = xlevelsDatastore) for (j in names(xlevels)) dat[[j]] <- factor(dat[[j]], levels = xlevels[[j]]) c5mod <- C5.0(churn ~ ., data = dat, rules = TRUE) ore.save(c5mod, name = datastoreName) TRUE }) ds.v <- ore.datastore(pattern= "myC5.0modelFL")$datastore.name for (ds in ds.v) ore.delete(name = ds) res <- ore.groupApply(CHURN_TEST, INDEX=CHURN_TEST$state, FUN.NAME = "myC5.0FunctionForLevels", xlevelsDatastore = "myXLevels", datastorePrefix = "myC5.0modelFL", ore.connect = TRUE) res <- ore.pull(res) all(as.logical(res) == TRUE) ore.scriptDrop("myScoringFunction") ore.scriptCreate("myScoringFunction", function(dat, xlevelsDatastore, datastorePrefix) { library(C50) state <- dat[1,"state"] datastoreName <- paste(datastorePrefix,state,sep="_") dat$state <- NULL ore.load(name = xlevelsDatastore) for (j in names(xlevels)) dat[[j]] <- factor(dat[[j]], levels = xlevels[[j]]) ore.load(name = datastoreName) res <- data.frame(pred = predict(c5mod, dat, type = "class"), actual = dat$churn, state = state) res } ) scores <- ore.rowApply( CHURN_TEST[CHURN_TEST$state == "MA",], FUN.NAME = "myScoringFunction", xlevelsDatastore = "myXLevels", datastorePrefix = "myC5.0modelFL", ore.connect = TRUE, parallel = TRUE, FUN.VALUE = data.frame(pred = character(0), actual = character(0), state = character(0)), rows=200) scores table(scores$actual, scores$pred)
Listing for This Example
R> library(C50) R> data(churn) R> R> ore.drop("CHURN_TEST" R> ore.create(churnTest, "CHURN_TEST") R> R> xlevels <- ore.getXlevels(~ ., CHURN_TEST[,-1]) R> ore.delete("myXLevels") [1] "myXLevels R> ore.save(xlevels, name = "myXLevels") R> R> ore.scriptDrop("myC5.0FunctionForLevels") R> ore.scriptCreate("myC5.0FunctionForLevels", + function(dat, xlevelsDatastore, datastorePrefix) { + library(C50) + state <- dat[1,"state"] + datastoreName <- paste(datastorePrefix, dat[1, "state"], sep = "_") + dat$state <- NULL + ore.load(name = xlevelsDatastore) + for (j in names(xlevels)) + dat[[j]] <- factor(dat[[j]], levels = xlevels[[j]]) + c5mod <- C5.0(churn ~ ., data = dat, rules = TRUE) + ore.save(c5mod, name = datastoreName) + TRUE + }) R> R> ds.v <- ore.datastore(pattern="myC5.0modelFL")$datastore.name R> for (ds in ds.v) ore.delete(name=ds) R> R> res <- ore.groupApply(CHURN_TEST, + INDEX=CHURN_TEST$state, + FUN.NAME="myC5.0FunctionForLevels", + xlevelsDatastore = "myXLevels", + datastorePrefix = "myC5.0modelFL", + ore.connect = TRUE) R> res <- ore.pull(res) R> all(as.logical(res) == TRUE) [1] TRUE R> R> ore.scriptDrop("myScoringFunction") R> ore.scriptCreate("myScoringFunction", + function(dat, xlevelsDatastore, datastorePrefix) { + library(C50) + state <- dat[1,"state"] + datastoreName <- paste(datastorePrefix,state,sep="_") + dat$state <- NULL + ore.load(name = xlevelsDatastore) + for (j in names(xlevels)) + dat[[j]] <- factor(dat[[j]], levels = xlevels[[j]]) + ore.load(name = datastoreName) + res <- data.frame(pred = predict(c5mod, dat, type="class"), + actual = dat$churn, + state = state) + res + } + ) R> R> scores <- ore.rowApply( + CHURN_TEST[CHURN_TEST$state =="MA",], + FUN.NAME = "myScoringFunction", + xlevelsDatastore = "myXLevels", + datastorePrefix = "myC5.0modelFL", + ore.connect = TRUE, parallel = TRUE, + FUN.VALUE = data.frame(pred=character(0), + actual=character(0), + state=character(0)), + rows=200 R> R> scores pred actual state 1 no no MA 2 no no MA 3 no no MA 4 no no MA 5 no no MA 6 no yes MA 7 yes yes MA 8 yes yes MA 9 no no MA 10 no no MA 11 no no MA 12 no no MA 13 no no MA 14 no no MA 15 yes yes MA 16 no no MA 17 no no MA 18 no no MA 19 no no MA 20 no no MA 21 no no MA 22 no no MA 23 no no MA 24 no no MA 25 no no MA 26 no no MA 27 no no MA 28 no no MA 29 no yes MA 30 no no MA 31 no no MA 32 no no MA 33 yes yes MA 34 no no MA 35 no no MA 36 no no MA 37 no no MA 38 no no MA Warning message: ORE object has no unique key - using random order R> table(scores$actual, scores$pred) no yes no 32 0 yes 2 4
Parent topic: R Interface for Embedded R Execution
6.2.7 Use the ore.indexApply Function
The ore.indexApply
function executes the specified user-defined input function using data that is generated by the input function.
The function supports task-parallel execution, in which one or more R engines perform the same or different calculations, or task. The times
argument to the ore.indexApply
function specifies the number of times that the input function executes in the database. Any required data must be explicitly generated or loaded within the input function.
The syntax of the ore.indexApply
function is the following:
ore.indexApply(times, FUN, ..., FUN.VALUE = NULL, FUN.NAME = NULL, FUN.OWNER = NULL, parallel = getOption("ore.parallel", NULL))
The ore.indexApply
function returns an ore.list
object or an ore.frame
object.
See Also:
-
Arguments for Functions that Run Scripts for descriptions of the arguments to function
ore.indexApply
Examples of the use of the ore.indexApply
function are in the following topics:
- Simple Example of Using the ore.indexApply Function
The example invokesore.indexApply
and specifies that it execute the input function five times in parallel. - Column-Parallel Use Case
The example uses the Rsummary
function to compute in parallel summary statistics on the first four numeric columns of theiris
data set. - Simulations Use Case
You can use theore.indexApply
function in simulations, which can take advantage of high-performance computing hardware like an Oracle Exadata Database Machine.
Parent topic: R Interface for Embedded R Execution
6.2.7.1 Simple Example of Using the ore.indexApply Function
The example invokes ore.indexApply
and specifies that it execute the input function five times in parallel.
Example 6-17 Using the ore.indexApply Function
This example displays the class of the result, which is ore.list
, and then displays the result.
res <- ore.indexApply(5, function(index) { paste("IndexApply:", index) }, parallel = TRUE) class(res) res
Listing for This Example
R> res <- ore.indexApply(5, + function(index) { + paste("IndexApply:", index) + }, + parallel = TRUE) R> class(res) [1] "ore.list" attr(,"package") [1] "OREembed" R> res $`1` [1] "IndexApply: 1" $`2` [1] "IndexApply: 2" $`3` [1] "IndexApply: 3" $`4` [1] "IndexApply: 4" $`5` [1] "IndexApply: 5"
Parent topic: Use the ore.indexApply Function
6.2.7.2 Column-Parallel Use Case
The example uses the R summary
function to compute in parallel summary statistics on the first four numeric columns of the iris
data set.
Example 6-18 Using the ore.indexApply Function and Combining Results
The example combines the computations into a final result. The first argument to the ore.indexApply
function is 4, which specifies the number of columns to summarize in parallel. The user-defined input function takes one argument, index
, which will be a value between 1 and 4 and which specifies the column to summarize.
The example invokes the summary
function on the specified column. The summary
invocation returns a single row, which contains the summary statistics for the column. The example converts the result of the summary
invocation into a data.frame
and adds the column name to it.
The example next uses the FUN.VALUE
argument to the ore.indexApply
function to define the structure of the result of the function. The result is then returned as an ore.frame
object with that structure.
res <- NULL res <- ore.indexApply(4, function(index) { ss <- summary(iris[, index]) attr.names <- attr(ss, "names") stats <- data.frame(matrix(ss, 1, length(ss))) names(stats) <- attr.names stats$col <- names(iris)[index] stats }, FUN.VALUE=data.frame(Min. = numeric(0), "1st Qu." = numeric(0), Median = numeric(0), Mean = numeric(0), "3rd Qu." = numeric(0), Max. = numeric(0), Col = character(0)), parallel = TRUE) res
Listing for This Example
R> res <- NULL R> res <- ore.indexApply(4, + function(index) { + ss <- summary(iris[, index]) + attr.names <- attr(ss, "names") + stats <- data.frame(matrix(ss, 1, length(ss))) + names(stats) <- attr.names + stats$col <- names(iris)[index] + stats + }, + FUN.VALUE=data.frame(Min. = numeric(0), + "1st Qu." = numeric(0), + Median = numeric(0), + Mean = numeric(0), + "3rd Qu." = numeric(0), + Max. = numeric(0), + Col = character(0)), + parallel = TRUE) R> res Min. X1st.Qu. Median Mean X3rd.Qu. Max. Col 1 2.0 2.8 3.00 3.057 3.3 4.4 Sepal.Width 2 4.3 5.1 5.80 5.843 6.4 7.9 Sepal.Length 3 0.1 0.3 1.30 1.199 1.8 2.5 Petal.Width 4 1.0 1.6 4.35 3.758 5.1 6.9 Petal.Length Warning message: ORE object has no unique key - using random order
Parent topic: Use the ore.indexApply Function
6.2.7.3 Simulations Use Case
You can use the ore.indexApply
function in simulations, which can take advantage of high-performance computing hardware like an Oracle Exadata Database Machine.
Example 6-19 Using the ore.indexApply Function in a Simulation
This example takes multiple samples from a random normal distribution to compare the distribution of the summary statistics. Each simulation occurs in a separate R engine in the database, in parallel, up to the degree of parallelism allowed by the database. The example defines variables for the sample size, the mean and standard deviations of the random numbers, and the number of simulations to perform. The example specifies num.simulations
as the first argument to the ore.indexApply
function. The ore.indexApply
function passes num.simulations
to the user-defined function as the index
argument. This input function then sets the random seed based on the index so that each invocation of the input function generates a different set of random numbers.
The input function next uses the rnorm
function to produce sample.size
random normal values. It invokes the summary
function on the vector of random numbers, and then prepares a data.frame
as the result it returns. The ore.indexApply
function specifies the FUN.VALUE
argument so that it returns an ore.frame
that structures the combined results of the simulations. The res
variable gets the ore.frame
returned by the ore.indexApply
function.
To get the distribution of samples, the example invokes the boxplot
function on the data.frame
that is the result of using the ore.pull
function to bring selected columns from res
to the client.
res <- NULL sample.size = 1000 mean.val = 100 std.dev.val = 10 num.simulations = 1000 res <- ore.indexApply(num.simulations, function(index, sample.size = 1000, mean = 0, std.dev = 1) { set.seed(index) x <- rnorm(sample.size, mean, std.dev) ss <- summary(x) attr.names <- attr(ss, "names") stats <- data.frame(matrix(ss, 1, length(ss))) names(stats) <- attr.names stats$index <- index stats }, FUN.VALUE=data.frame(Min. = numeric(0), "1st Qu." = numeric(0), Median = numeric(0), Mean = numeric(0), "3rd Qu." = numeric(0), Max. = numeric(0), Index = numeric(0)), parallel = TRUE, sample.size = sample.size, mean = mean.val, std.dev = std.dev.val) options("ore.warn.order" = FALSE) head(res, 3) tail(res, 3) boxplot(ore.pull(res[, 1:6]), main=sprintf("Boxplot of %d rnorm samples size %d, mean=%d, sd=%d", num.simulations, sample.size, mean.val, std.dev.val))
Listing for This Example
R> res <- ore.indexApply(num.simulations, + function(index, sample.size = 1000, mean = 0, std.dev = 1) { + set.seed(index) + x <- rnorm(sample.size, mean, std.dev) + ss <- summary(x) + attr.names <- attr(ss, "names") + stats <- data.frame(matrix(ss, 1, length(ss))) + names(stats) <- attr.names + stats$index <- index + stats + }, + FUN.VALUE=data.frame(Min. = numeric(0), + "1st Qu." = numeric(0), + Median = numeric(0), + Mean = numeric(0), + "3rd Qu." = numeric(0), + Max. = numeric(0), + Index = numeric(0)), + parallel = TRUE, + sample.size = sample.size, + mean = mean.val, std.dev = std.dev.val) R> options("ore.warn.order" = FALSE) R> head(res, 3) Min. X1st.Qu. Median Mean X3rd.Qu. Max. Index 1 67.56 93.11 99.42 99.30 105.8 128.0 847 2 67.73 94.19 99.86 100.10 106.3 130.7 258 3 65.58 93.15 99.78 99.82 106.2 134.3 264 R> tail(res, 3) Min. X1st.Qu. Median Mean X3rd.Qu. Max. Index 1 65.02 93.44 100.2 100.20 106.9 134.0 5 2 71.60 93.34 99.6 99.66 106.4 131.7 4 3 69.44 93.15 100.3 100.10 106.8 135.2 3 R> boxplot(ore.pull(res[, 1:6]), + main=sprintf("Boxplot of %d rnorm samples size %d, mean=%d, sd=%d", + num.simulations, sample.size, mean.val, std.dev.val))
Figure 6-2 Display of the boxplot Function in Example 6-19
Description of "Figure 6-2 Display of the boxplot Function in Example 6-19"
Parent topic: Use the ore.indexApply Function
6.3 SQL Interface for Embedded R Execution
The SQL interface for Oracle Machine Learning for R embedded R execution allows you to execute R functions in production database applications.
The SQL interface has procedures for the following actions:
-
Adding and removing a script from the OML4R script repository
-
Granting or revoking read privilege access to a script by the owner to other users
-
Executing an R script in an embedded R session
-
Deleting an OML4R datastore
Data dictionary views provide information about scripts and datastores.
This SQL interface is described in the following topics:
- About Oracle Machine Learning for R SQL Table Functions
OML4R provides SQL table functions that are equivalents of most of the R interface functions for embedded R execution. - Manage Scripts in SQL
This topic lists the PL/SQL procedures and Oracle Database data dictionary views for creating and managing R scripts. - Manage Datastores in SQL
Oracle Machine Learning for R provides PL/SQL procedures and Oracle Database data dictionary views for the basic management of datastores in SQL.
Parent topic: Use Oracle Machine Learning for R Embedded R Execution
6.3.1 About Oracle Machine Learning for R SQL Table Functions
OML4R provides SQL table functions that are equivalents of most of the R interface functions for embedded R execution.
Executing a SELECT FROM TABLE
statement and specifying one of the table functions results in the invocation of the specified R script. The script runs in one or more R engines on the Oracle Database server.
The SQL table functions for embedded R execution are:
-
rqEval
-
rqGroupEval
-
rqRowEval
-
rqTableEval
The R interface functions and the SQL equivalents are listed in Table 6-1.
For the rqGroupEval
function, OML4R provides a generic implementation of the group apply functionality in SQL. You must write a table function that captures the structure of the input cursor.
See the reference pages for the functions for information about them, including examples of their use.
Some general aspects of the SQL table functions are described in the following topics:
- Parameters of the SQL Table Functions
The SQL table functions have some parameters in common and some functions have parameters that are unique to that function. - Return Value of SQL Table Functions
The Oracle Machine Learning for R SQL table functions return a table. - Connect to Oracle Machine Learning for R in Embedded R Execution
To establish a connection to OML4R on the Oracle Database server during the embedded R execution, you can specify the control argumentore.connect
in the parameters cursor.
Parent topic: SQL Interface for Embedded R Execution
6.3.1.1 Parameters of the SQL Table Functions
The SQL table functions have some parameters in common and some functions have parameters that are unique to that function.
The parameters of the SQL table functions are the following.
Table 6-2 SQL Table Function Parameters
Parameter | Description |
---|---|
|
A cursor that specifies the data that is input to the R function specified by |
|
A cursor that specifies arguments to pass to the R function. The parameters cursor consists of a single row of scalar values. An argument can be a string or a numeric value. You can specify multiple arguments in the cursor. Arguments to an R function are case sensitive, so you should put names, such as a column name, in double quotes. In the cursor, you can also specify as scalar values an OML4R control argument or the names of serialized R objects, such as predictive models, that are in an OML4R datastore. The value of this parameters cursor can be |
|
An output table definition. The value of this argument can be |
|
For the |
|
For the |
|
The name of a script in the OML4R script repository. |
Parent topic: About Oracle Machine Learning for R SQL Table Functions
6.3.1.2 Return Value of SQL Table Functions
The Oracle Machine Learning for R SQL table functions return a table.
The structure and contents of the table are determined by the results of the R function passed to the SQL table function and by the OUT_QRY
parameter. The R function can return a data.frame
object, other R objects, and graphics. The structure of the table that represents the results of the R function is specified by one of the following OUT_QRY
values:
-
NULL
, which results in a table that has a serialized object that can contain both data and image objects. -
A table signature specified in a
SELECT
statement, which results in a table that has the defined structure. The result of the R function must be adata.frame
. No images are returned. -
The string
'XML'
, which results in a table that has a CLOB that can contain both structured data and graph images in an XML string. The non-image R objects, such asdata.frame
ormodel
objects, are provided first, followed by the base 64 encoding of a PNG representation of the image. -
The string
'PNG'
, which results in a table that has a BLOB that contains graph images in PNG format. The table has the column namesname
,id
, andimage
.
Parent topic: About Oracle Machine Learning for R SQL Table Functions
6.3.1.3 Connect to Oracle Machine Learning for R in Embedded R Execution
To establish a connection to OML4R on the Oracle Database server during the embedded R execution, you can specify the control argument ore.connect
in the parameters cursor.
Doing so establishes a connection using the credentials of the user who invoked the embedded R function. It also automatically loads the ORE
package. Establishing an OML4R connection is required to save objects in an OML4R R object datastore or to load objects from a datastore. It also allows you to explicitly use the OML4R transparency layer.
See Also:
Optional and Control Arguments for information on other control arguments
Parent topic: About Oracle Machine Learning for R SQL Table Functions
6.3.2 Manage Scripts in SQL
This topic lists the PL/SQL procedures and Oracle Database data dictionary views for creating and managing R scripts.
The functions in the SQL API for embedded R execution require as an argument a named script that is stored in the OML4R script repository. The PL/SQL procedures sys.rqScriptCreate
and sys.rqScriptDrop
create and drop scripts. To create a script or drop one from the script repository requires the RQADMIN role.
When using the sys.rqScriptCreate
function, you must specify a name for the script and an R function script that contains a single R function definition. Calls to the functions sys.rqScriptCreate
and sys.rqScriptDrop
must be wrapped in a BEGIN-END
PL/SQL block. The script repository stores the R function as a character large object (a CLOB
), so you must enclose the function definition in single quotes to specify it as a string.
The owner of a script can use the rqGrant
procedure to grant to another user read privilege access to a script or use the rqRevoke
procedure to revoke the privilege. To use a script granted to you by another user, you must specify the owner by prepending the owner’s name and a period to the name of the script, as in the following:
select * from table(rqEval(NULL, ‘select 1 x from dual’, ‘owner_name.script_name’));
The owner prefix is not required for a public script or for a script owned by the user.
The following tables list the PL/SQL procedures for managing script repository scripts and the data dictionary views that contain information about scripts.
Table 6-3 PL/SQL Procedures for Managing Scripts
PL/SQL Procedure | Description |
---|---|
rqGrant |
Grants read privilege access to a datastore or script. |
rqRevoke |
Revokes read privilege access to a datastore or script. |
sys.rqScriptCreate |
Adds the provided R function into the script repository with the provided name. |
sys.rqScriptDrop |
Removes the named R function from the script repository. |
Table 6-4 Data Dictionary Views for Scripts
Data Dictionary View | Description |
---|---|
ALL_RQ_SCRIPTS |
Describes the scripts in the OML4R script repository that are available to the current user |
USER_RQ_SCRIPTS |
Describes the scripts in the script repository that are owned by the current user. |
USER_RQ_SCRIPT_PRIVS |
Describes the scripts in the script repository to which the current user has granted read access and the users to whom access has been granted. |
SYS.RQ_SCRIPTS |
Describes the system scripts in the script repository. |
Example 6-20 Create a Script with the SQL APIs
This example uses the sys.rqScriptCreate
procedure to create a script in the Oracle Machine Learning for R script repository.
The example creates the user-defined function named myRandomRedDots2
. The user-defined function accepts two arguments, and it returns a data.frame
object that has two columns and that plots the specified number of random normal values. The sys.rqScriptCreate
function stores the user-defined function in the OML4R script repository.
-- Create a script named myRandomRedDots2 and add it to the script repository.
-- Specify that the script is private and to overwrite a script with the same name.
BEGIN
sys.rqScriptCreate('myRandomRedDots2',
'function(divisor = 100, numDots = 100) {
id <- 1:10
plot(1:numDots, rnorm(numDots), pch = 21, bg = "red", cex = 2 )
data.frame(id = id, val = id / divisor)}',
v_global => FALSE,
v_overwrite => TRUE);
END;
/
-- Grant read privilege access to Scott.
BEGIN
rqGrant('myRandomRedDots2', 'rqscript', 'SCOTT');
END;
/
-- View the users granted read access to myRandomRedDots2.
select * from USER_RQ_SCRIPT_PRIVS;
NAME GRANTEE
---------------- -------
myRandomRedDots SCOTT
-- Revoke the read privilege access from Scott.
BEGIN
rqRevoke('myRandomRedDots2', 'rqscript', 'SCOTT');
END;
/
-- Remove the script from the script repository.
BEGIN
sys.rqScriptDrop('myRandomRedDots2');
END;
/
Parent topic: SQL Interface for Embedded R Execution
6.3.3 Manage Datastores in SQL
Oracle Machine Learning for R provides PL/SQL procedures and Oracle Database data dictionary views for the basic management of datastores in SQL.
The following tables list the procedures and views.
Table 6-5 PL/SQL Procedures for Managing Datastores
PL/SQL Procedures | Description |
---|---|
rqGrant |
Grants read privilege access to a datastore or script. |
rqRevoke |
Revokes read privilege access to a datastore or script. |
rqDropDataStore |
Deletes a datastore. |
Table 6-6 Data Dictionary Views for Datastores
Views | Description |
---|---|
ALL_RQ_DATASTORES |
Describes the datastores available to the current user, including whether the datastore is grantable. |
RQUSER_DATASTORELIST |
Describes the datastores in the Oracle Database schema.. |
RQUSER_DATASTORECONTENTS |
Describes the objects in the datastores in the Oracle Database schema. |
USER_RQ_DATASTORE_PRIVS |
Describes the datastores and the users to whom the current user has granted read privilege access. |
USER_RQ_DATASTORES |
Describes the datastores owned by the current user, including whether the datastore is grantable. |
Parent topic: SQL Interface for Embedded R Execution