A.4 rqGroupEval Function

The rqGroupEval function is a user-defined function that identifies a grouping column. The user defines an rqGroupEval function in PL/SQL using the SQL object rqGroupEvalImpl, which is a generic implementation of the group apply functionality in SQL. The implementation supports data-parallel execution, in which one or more R engines perform the same R function, or task, on different partitions of data. The data is partitioned according to the values of the grouping column.

Only one grouping column is supported. If you have multiple columns, then combine the columns into one column and use the new column as the grouping column.

The rqGroupEval function executes the R function in the script specified by the EXP_NAM parameter. You pass data to the R function with the INP_CUR parameter. You can pass arguments to the R function with the PAR_CUR parameter.

The R function returns an R data.frame object, which appears as a SQL table in the database. You define the form of the returned value with the OUT_QRY parameter.

To create an rqGroupEval function, you create the following two PL/SQL objects:

  • A PL/SQL package that specifies the types of the result to return.

  • A function that takes the return value of the package and uses the return value with PIPELINED_PARALLEL_ENABLE set to indicate the column on which to partition data.

Syntax

rqGroupEval (
     INP_CUR     REF CURSOR     IN
     PAR_CUR     REF CURSOR     IN
     OUT_QRY     VARCHAR2       IN
     GRP_COL     VARCHAR2       IN
     EXP_NAM     VARCHAR2       IN)

Parameters

Parameter Description

INP_CUR

A cursor that specifies the data to pass to the R function specified by the EXP_NAME parameter.

PAR_CUR

A cursor that contains argument values to pass to the R function.

OUT_QRY

One of the following:

  • NULL, which returns a serialized object that can contain both data and image objects.

  • A SQL SELECT statement that specifies the column names and data types of the table returned by rqEval. Any image data is discarded. You can provide a prototype row using the dual dummy table or you can base the SELECT statement on an existing table or view. The R function must return a data.frame.

  • The string 'XML', which specifies that the table returned contains a CLOB that is an XML string. The XML can contain both structured data and images, with structured or semi-structured R objects first, followed by the image or images generated by the R function. Images are returned as a base 64 encoding of the PNG representation.

  • The string 'PNG', which specifies that the table returned contains a BLOB that has the image or images generated by the R function in PNG format.

GRP_COL

The name of the grouping column by which to partition the data.

EXP_NAM

The name of a script in the Oracle R Enterprise R script repository.

Return Value

The user-defined rqGroupEval function returns a table that has the structure specified by the OUT_QRY parameter value.

Examples

Example A-6 has a PL/SQL block that drops the script myC5.0Function to ensure that the script does not exist in the Oracle R Enterprise R script repository. It then creates a function and stores it as the script myC5.0Function in the R script repository.

The R function accepts two arguments: the data on which to operate and a prefix to use in creating datastores. The function uses the C50 package to build C5.0 models on the churn data set from C50. The function builds one churn model on the data for each state.

The myC5.0Function function loads the C50 package so that the function body has access to it when the function executes in an R engine on the database server. The function then creates a datastore name using the datastore prefix and the name of a state. To exclude the state name from the model, the function deletes the column from the data.frame. Because factors in the data.frame are converted to character vectors when they are loaded in the user-defined embedded R function, the myC5.0Function function explicitly converts the character vectors back to R factors.

The myC5.0Function function gets the data for the state from the specified columns and then creates a model for the state and saves the model in a datastore. The R function returns TRUE to have a simple value that can appear as the result of the function execution.

Example A-6 next creates a PL/SQL package, churnPkg, and a user-defined function, churnGroupEval. In defining an rqGroupEval function implementation, the PARALLEL_ENABLE clause is optional but the CLUSTER BY clause is required.

Finally, the example executes a SELECT statement that invokes the churnGroupEval function. In the INP_CUR argument of the churnGroupEval function, the SELECT statement specifies the PARALLEL hint to use parallel execution of the R function and the data set to pass to the R function. The INP_CUR argument of the churnGroupEval function specifies connecting to Oracle R Enterprise and the datastore prefix to pass to the R function. The OUT_QRY argument specifies returning the value in XML format, the GRP_NAM argument specifies using the state column of the data set as the grouping column, and the EXP_NAM argument specifies the myC5.0Function script in the R script repository as the R function to invoke.

For each of 50 states plus Washington, D.C., the SELECT statement returns from the churnGroupEval table function the name of the state and an XML string that contains the value TRUE.

See Also:

Example 6-13

Example A-6 Using an rqGroupEval Function

BEGIN
  sys.rqScriptDrop('myC5.0Function');
  sys.rqScriptCreate('myC5.0Function',
    'function(dat, datastorePrefix) {
       library(C50)
       datastoreName <- paste(datastorePrefix, dat[1, "state"], sep = "_")
       dat$state <- NULL
       dat$churn <- as.factor(dat$churn)
       dat$area_code <- as.factor(dat$area_code)
       dat$international_plan <- as.factor(dat$international_plan)
       dat$voice_mail_plan <- as.factor(dat$voice_mail_plan)
       mod <- C5.0(churn ~ ., data = dat, rules = TRUE)
       ore.save(mod, name = datastoreName)
       TRUE
      }');
END;
/

CREATE OR REPLACE PACKAGE churnPkg AS
  TYPE cur IS REF CURSOR RETURN CHURN_TRAIN%ROWTYPE;
END churnPkg;
/
CREATE OR REPLACE FUNCTION churnGroupEval(
  inp_cur churnPkg.cur,
  par_cur SYS_REFCURSOR, 
  out_qry VARCHAR2,
  grp_col VARCHAR2,
  exp_txt CLOB)
RETURN SYS.AnyDataSet
PIPELINED PARALLEL_ENABLE (PARTITION inp_cur BY HASH ("state"))
CLUSTER inp_cur BY ("state")
USING rqGroupEvalImpl;
/
 
SELECT *
  FROM table(churnGroupEval(
    cursor(SELECT * /*+ parallel(t,4) */ FROM CHURN_TRAIN t),
    cursor(SELECT 1 AS "ore.connect",
           'myC5.0model' AS "datastorePrefix" FROM dual),
    'XML', 'state', 'myC5.0Function'));