The rqRowEval function executes the R function in the script specified by the EXP_NAM parameter. You pass data to the R function with the INP_CUR parameter. You can pass arguments to the R function with the PAR_CUR parameter. The ROW_NUM parameter specifies the number of rows that should be passed to each invocation of the R function. The last chunk may have fewer rows than the number specified.
The rqRowEval function supports data-parallel execution, in which one or more R engines perform the same R function, or task, on disjoint chunks of data. Oracle Database handles the management and control of the potentially multiple R engines that run on the database server machine, automatically chunking and passing data to the R engines executing in parallel. Oracle Database ensures that R function executions for all chunks of rows complete, or the rqRowEval function returns an error.
The R function returns an R data.frame object, which appears as a SQL table in the database. You define the form of the returned value with the OUT_QRY parameter.
Syntax
rqRowEval (
INP_CUR REF CURSOR IN
PAR_CUR REF CURSOR IN
OUT_QRY VARCHAR2 IN
ROW_NUM NUMBER IN
EXP_NAM VARCHAR2 IN)
Parameters
Table A-1 Parameters of the rqRowEval Function
| Parameter | Description |
|---|---|
|
|
A cursor that specifies the data to pass to the R function specified by the |
|
|
A cursor that contains argument values to pass to the R function. |
|
|
One of the following:
|
|
|
The number of rows to include in each invocation of the R function. |
|
|
The name of a script in the Oracle Database R script repository. |
Return Value
Function rqRowEval returns a table that has the structure specified by the OUT_QRY parameter value.
Examples
Example A-8 uses the C50 package to score churn data (that is, to predict which customers are likely to churn) using C5.0 decision tree models. The example scores the customers from the specified state in parallel. This example produces the same result as the invocation of function ore.rowApply in Example 6-16.
Tip:
Example A-8 uses the CHURN_TEST table and the myXLevels datastore created by Example 6-16 so in R you should invoke the functions that create the table and that get the xlevels object and save it in the myXLevels datastore in Example 6-16 before running Example A-8.
As Example A-6 does, Example A-8 creates a user-defined function and saves the function in the Oracle Database R script repository. The user-defined function creates a C5.0 model for a state and saves the model in a datastore. In Example A-8, however, the user-defined function myC5.0FunctionForLevels uses the list of levels created in Example 6-16 instead of computing the levels using the as.factor function as function myC5.0Function does in Example A-6. The function myC5.0FunctionForLevels returns the value TRUE.
As Example A-6 does, Example A-8 creates the PL/SQL package churnPkg and the function churnGroupEval. Example A-6 declares a cursor to get the names of the datastores that include the string myC5.0modelFL and then executes a PL/SQL block that deletes those datastores. The example next executes a SELECT statement that invokes the churnGroupEval function. The churnGroupEval function invokes the myC5.0FunctionForLevels function to generate the C5.0 models and save them in datastores.
Example A-8 then creates the myScoringFunction function and stores it in the R script repository. The function scores a C5.0 model for the levels of a state and returns the results in a data.frame.
Finally, Example A-8 executes a SELECT statement that invokes the rqRowEval function. The input cursor to the rqRowEval function uses the PARALLEL hint to specify the degree of parallelism to use. The cursor specifies the CHURN_TEST table as the data source and filters the rows to include only those for Massachusetts. All rows processed use the same predictive model.
The parameters cursor specifies the ore.connect control argument to connect to Oracle R Enterprise on the database server and specifies values for the datastorePrefix and xlevelsDatastore arguments to the myScoringFunction function.
The SELECT statement for the OUT_QRY parameter specifies the format of the output. The ROW_NUM parameter specifies 200 as the number of rows to process at a time in each parallel R engine. The EXP_NAME parameter specifies myScoringFunction in the R script repository as the R function to invoke.
Example A-8 Using an rqRowEval Function
BEGIN
sys.rqScriptDrop('myC5.0FunctionForLevels');
sys.rqScriptCreate('myC5.0FunctionForLevels',
'function(dat, xlevelsDatastore, datastorePrefix) {
library(C50)
state <- dat[1,"state"]
datastoreName <- paste(datastorePrefix, dat[1, "state"], sep = "_")
dat$state <- NULL
ore.load(name = xlevelsDatastore) # To get the xlevels object.
for (j in names(xlevels))
dat[[j]] <- factor(dat[[j]], levels = xlevels[[j]])
c5mod <- C5.0(churn ~ ., data = dat, rules = TRUE)
ore.save(c5mod, name = datastoreName)
TRUE
}');
END;
/
CREATE OR REPLACE PACKAGE churnPkg AS
TYPE cur IS REF CURSOR RETURN CHURN_TEST%ROWTYPE;
END churnPkg;
/
CREATE OR REPLACE FUNCTION churnGroupEval(
inp_cur churnPkg.cur,
par_cur SYS_REFCURSOR,
out_qry VARCHAR2,
grp_col VARCHAR2,
exp_txt CLOB)
RETURN SYS.AnyDataSet
PIPELINED PARALLEL_ENABLE (PARTITION inp_cur BY HASH ("state"))
CLUSTER inp_cur BY ("state")
USING rqGroupEvalImpl;
/
DECLARE
CURSOR c1
IS
SELECT dsname FROM rquser_DataStoreList WHERE dsname like 'myC5.0modelFL%';
BEGIN
FOR dsname_st IN c1
LOOP
rqDropDataStore(dsname_st.dsname);
END LOOP;
END;
SELECT *
FROM table(churnGroupEval(
cursor(SELECT * /*+ parallel(t,4) */ FROM CHURN_TEST t),
cursor(SELECT 1 AS "ore.connect",
'myXLevels' as "xlevelsDatastore",
'myC5.0modelFL' AS "datastorePrefix" FROM dual),
'XML', 'state', 'myC5.0FunctionForLevels'));
BEGIN
sys.rqScriptDrop('myScoringFunction');
sys.rqScriptCreate('myScoringFunction',
'function(dat, xlevelsDatastore, datastorePrefix) {
library(C50)
state <- dat[1, "state"]
datastoreName <- paste(datastorePrefix, state, sep = "_")
dat$state <- NULL
ore.load(name = xlevelsDatastore) # To get the xlevels object.
for (j in names(xlevels))
dat[[j]] <- factor(dat[[j]], levels = xlevels[[j]])
ore.load(name = datastoreName)
res <- data.frame(pred = predict(c5mod, dat, type = "class"),
actual= dat$churn,
state = state)
res
}');
END;
/
SELECT * FROM table(rqRowEval(
cursor(select /*+ parallel(t, 4) */ *
FROM CHURN_TEST t
WHERE "state" = 'MA'),
cursor(SELECT 1 as "ore.connect",
'myC5.0modelFL' as "datastorePrefix",
'myXLevels' as "xlevelsDatastore"
FROM dual),
'SELECT ''aaa'' "pred",''aaa'' "actual" , ''aa'' "state" FROM dual',
200, 'myScoringFunction'));
In Oracle SQL Developer, the results of the last SELECT statement are:
pred actual state ---- ------ ----- no no MA no no MA no no MA no no MA no no MA no no MA no no MA no yes MA yes yes MA yes yes MA no no MA no no MA no no MA no no MA no no MA no no MA yes yes MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no no MA no yes MA no no MA no no MA no no MA yes yes MA no no MA no no MA no no MA 38 rows selected