B Using External R Models in Oracle RTD

Oracle R Enterprise (ORE) is a component of the Oracle Advanced Analytics Option that integrates the R programming language with the Oracle Database. This appendix describes how Oracle RTD can trigger R script execution from within an Inline Service.

R is an open source programming language and software environment for statistical computing and graphics. It is widely used among statisticians for developing statistical software and data analysis. Oracle R Enterprise (ORE) integrates the R programming language with the Oracle Database, R users can develop, refine, and deploy R scripts that leverage the parallelism and scalability of the database to automate data analysis.

B.1 Oracle R Enterprise (ORE) and Oracle RTD

ORE integration enables Oracle RTD to trigger R script execution from within an Inline Service. The following are examples of business problems that can be addressed with an addition of R integration:

  • Data scientists can define an R computation that defines the segments of customers with different treatments applied to different segments. The R function may load and apply a previously built and saved (as an R object) predictive model. Caution: both R script execution and loading R objects bear certain performance costs

  • Offline data mining / forecasting models can be scored and saved into the database for use by Oracle RTD. These R calculations can be triggered by an Oracle RTD informant.

Oracle RTD itself has statistical capabilities embedded within it. However, the models natively available within Oracle RTD are specifically optimized for performance and are designed for continuous learning. This solves narrowly defined problems, such as choice outcome probability estimation. Integration with ORE significantly enhances the applicability of statistical data analysis and modeling to automating real-time decision making.

B.1.1 ORE Integration Architecture

This section describes the overall architecture of an integrated Oracle RTD / ORE application.

B.1.1.1 Overview of Oracle R Enterprise Architecture

Oracle R Enterprise has three main components:

This image is described in the surrounding text.

  1. The Client R Engine is a collection of R packages that allows you to connect to an Oracle Database and to interact with data in that database.

  2. The Server is a collection of PL/SQL procedures and libraries that augment Oracle Database with the capabilities required to support an Oracle R Enterprise client. The R engine is also installed on Oracle Database to support embedded execution. Oracle Database spawns R engines, which can provide data parallelism.

  3. R Engines spawned by Oracle Database support database-managed parallelism; provide lights-out scheduled execution of R scripts, that is scheduling or triggering R scripts packaged inside PL/SQL or SQL query. As of ORE 1.3, these R scripts can load R objects from in-database data stores and make use of them.

For more information on ORE architecture, please refer to ORE documentation: Oracle R Enterprise 1.3 User's Guide.

B.1.1.2 Joint Implementation Architecture for Oracle RTD with ORE

This section describes components of an integrated Oracle RTD / ORE solution.

This image is described in the surrounding text.

An Oracle RTD Inline Service using ORE would contain a Java function that uses JDBC to order a script execution by ORE embedded R engine. Deploying updated versions of R scripts could be facilitated by a special "Update Script" informant. Statistical models and other R objects could be built in R Development Environment and deployed to an Oracle Database using ore.save() API. They can then subsequently be loaded and used by R scripts that are invoked from Oracle RTD but executed within Oracle Database.

B.1.2 Setting up ORE for Use by Oracle RTD

In order to use ORE within an Oracle RTD Inline Service, the following prerequisites must be satisfied:

  • Oracle Database 11.2.0.3

  • Oracle R Enterprise 1.3

  • Oracle R Distribution (R 2.15.1)

B.1.2.1 Obtaining R

The recommended way to obtain R 2.15.1 is by downloading Oracle R Distribution.

B.1.2.2 Users, Roles, and Privileges

As stated in ORE documentation, the Oracle database user deploying R scripts must have RQADMIN role, and the database user executing R scripts must have RQROLE role. Further, this user must be given the following system privileges: "CREATE MINING MODEL", "CREATE TABLE", "CREATE PROCEDURE", and "CREATE VIEW".

Unless you plan to use a different database instance for hosting R execution, it is not necessary to create a dedicated Oracle schema for this purpose. The integration example described in the following section utilizes the same database schema as Oracle RTD system data source. The database user has been assigned both RQADMIN and RQROLE roles.

B.1.2.3 Special Considerations for Data Sources

If the database backing up your ORE installation is different from your Oracle RTD database you will need to apply slight modifications to the steps described in the Integration Example section.

Follow the instructions in Chapters 3 and 4 of the Oracle Real-Time Decisions Installation and Administration Guide to set up an additional data source in Oracle RTD. Remember to replace all the references to the SDDS data source name in the steps described in the Integration Example section with the JNDI name of the data source connecting to the database where ORE is installed.

Note: Make sure that the user connecting to the ORE database is given the appropriate privileges (RQADMIN and RQROLE).

B.1.3 Integration Example

A sample Inline Service, CrossSellR, a slightly modified version of CrossSell Inline Service, is available with Oracle RTD. It addresses the same use case as CrossSell with an addition of utilizing R for customer segmentation.

This Inline Service shows how the competency of using R to analyze data and build statistical models can be separated from competencies of Inline Service development and database application development: a data scientist using R would have a simple interface available for deploying R scripts.

Refer to Chapter 2.3 of Oracle Real-Time Decisions Installation and Administration Guide for instructions on how to initialize the database for use with CrossSellR (same as CrossSell).

B.1.3.1 Steps for Making Use of R Scripting in Oracle RTD Inline Services

CrossSellR contains modifications to CrossSell that illustrate the following steps you will need to take to make use of R in your Inline Service:

  • Create a script launching function, providing the input to, and making use of output of, an R script (in our example, this function is called SimpleRRegression). This function calls a SQL statement that executes an embedded R script deployed using UpdateScript informant, returning the result of the calculation performed in R.

  • Create a new Customer entity attribute: RiskByR mapped to SimpleRRegression function. This attribute is inherited by the Session entity (since it includes Customer entity).

  • Modify the filtering rule "Segment to Retain" to check the value of RiskByR session entity attribute.

  • Create a new function, DeployR, performing deployment of an updated R script to the Oracle RTD database.

  • Create a new informant: UpdateScript, enabling deployment of a new R script to the Oracle RTD database.

B.1.3.2 Creating a Script Launching Function

This image is described in the surrounding text.

Here is the full code of the function:

// risk by default:
double riskVal = .1;
StringBuffer rqEvalSB = new StringBuffer();
rqEvalSB.append("select *")
.append(" from table(rqtableeval(cursor (select ? CALLSABANDONED, ? COMPLAINTSPERYEAR from dual),")
.append(" cursor(select 1 as \"ore.connect\" from dual),")
.append(" 'SELECT 1 PRD FROM DUAL','execLm'))");
java.sql.Connection connection = null;
try {
DatabaseProviderInterface dp = newDatabaseProvider("SDDS");
 connection = dp.getConnection();
 java.sql.PreparedStatement statement = connection.prepareStatement(rqEvalSB.toString());
 statement.setInt(1, session().getCustomer().getCallsAbandoned());
 statement.setInt(2, session().getCustomer().getComplaintsPerYear());
 java.sql.ResultSet rs = statement.executeQuery();
 if (!rs.next()) {
 logInfo("Error: empty result set");
 return 0.;
 }
 
riskVal = rs.getDouble(1);
} catch(java.sql.SQLException e) {
 e.printStackTrace();
 logError("exception in GENSimpleRRegression " + e.getLocalizedMessage());
} finally {
 try {
 if (connection != null)
 connection.close();
 } catch( java.sql.SQLException cannotEvenClose) {
 }
}
return riskVal;

B.1.3.3 Mapping an Entity Attribute to the New Function

This image is described in the surrounding text.

We created a new attribute, RiskByR, and mapped it to the function SimpleRRegression().

B.1.3.4 Using the New Entity Attribute in Decision Logic

In our example we use the new entity attribute in a filtering rule:

This image is described in the surrounding text.

B.1.3.5 Creating a Deployment Function

This image is described in the surrounding text.

Here is the full code of the function:

logInfo("Deploying the script " + rScriptName + ":\n" + rScriptCode);
java.sql.Connection connection = dp.getConnection();
try {
DatabaseProviderInterface dp = newDatabaseProvider("SDDS");
 java.sql.CallableStatement dropStmt = connection.prepareCall("{call SYS.rqScriptDrop(?)}");
 dropStmt.setString(1, rScriptName);
 dropStmt.execute();
 
logInfo(" Dropped script " + rScriptName);
 
java.sql.CallableStatement statement = connection.prepareCall("{call SYS.rqScriptCreate(?, ?)}");
 statement.setString(1, rScriptName);
 statement.setString(2, rScriptCode); // the type of this parameter is CLOB; does setString work?
 boolean res = statement.execute();
 
logInfo("Creating script " + rScriptName + " returned " + res);
} catch( java.sql.SQLException e) {
 e.printStackTrace();
}

B.1.3.6 Creating a Deployment Informant

This image is described in the surrounding text.

On the Logic tab:

This image is described in the surrounding text.

B.1.3.7 Building and Saving a Linear Regression Model in R

CrossSellR Inline Service assumes a linear regression model named "linChurnMod" is saved in ORE. For the purposes of this sample we will generate random data and train a model on that data. To build the model, open an R session (using, for example, your R environment):

ore.connect(user="<db user>", sid="<sid>",host="<db host>", password="<db password>", port=<db port>, all=TRUE)
customers <- ore.pull(CROSSSELLCUSTOMERS)
customers$CHURNPROB <- 1./(1.+exp(5.-customers$COMPLAINTSPERYEAR)) * 1./(1.+exp(2.-customers$CALLSABANDONED))
uniformRandSeq <- runif(50000)
customers$CHURNEVENT <- (sign(customers$CHURNPROB - uniformRandSeq) + 1.) / 2.
linChurnMod <- lm(CHURNEVENT ~ CALLSABANDONED + COMPLAINTSPERYEAR, customers, model = FALSE)
summary(linChurnMod)
# trimming the model
linChurnMod$residuals <- NULL
linChurnMod$effects <- NULL
linChurnMod$fitted.values <- NULL
linChurnMod$qr$qr <- NULL
ore.save(linChurnMod, name="ds_1", description="linear churn model")

After executing the above script, the model is saved to the Oracle database.

B.1.3.8 Updating the Script

Upon deploying CrossSellR Inline Service, invoke its informant "Update Script" (from Decision Center or Decision Studio). The parameters to that informant are script name and script content. Provide the following values:

  • scriptName: execLm

  • scriptCode: function(dat) {;ore.load(name="ds_1", list="linChurnMod");prd <- predict(linChurnMod, newdata=dat);res <- data.frame(CHURNRISK=prd);res;}

Note: In R language, carriage return and semi-colon (";") can both be used to separate statements.

Note: The informant we created facilitates updating the scripts, hiding the somewhat obscure SQL statements that have to be executed to achieve that. It is, of course, possible to execute the same SQL manually from SQLPlus or your favorite SQL query editor.

Note: CrossSellR contains an Advisor, OfferRequest, that returns a likelihood. When this Inline Service is deployed, the Customer entity is populated. That in turn results in an attempt to invoke the R script, execLm. If you deploy your Inline Service before the script is created in the database, you will see an error in your server log that will look like the following:

ERROR [CrossSellRORE] exception in GENSimpleRRegression ORA-20000: RQuery error
ORA-06512: at "RQSYS.RQTABLEEVALIMPL", line 84
ORA-06512: at "RQSYS.RQTABLEEVALIMPL", line 80

After you run the UpdateScript informant, the script will be created, and the error should no longer appear on subsequent deployments of the Inline Service.

B.1.3.9 Testing the Inline Service

Now, if you invoke the CallStart informant, providing a valid Customer ID and channel, you should see a line appear on the Log tab mentioning the risk value, like this:

This image is described in the surrounding text.

B.1.4 Performance and Scalability Considerations

Oracle R Enterprise allows you to leverage the power of parallel execution of the Oracle database engine to perform R calculations in parallel, thereby achieving scalability that a standalone R installation is not capable of. However, care must be taken when using R calculations within decision logic in Oracle RTD Inline Services.

Due to the computational cost of running the R interpreter and the stateless nature of ORE embedded R engine, the latency and throughput of an Oracle RTD application can be significantly affected. It is generally not recommended to perform R calculations at each request or even once per Oracle RTD session in an environment with stringent latency and high throughput requirements, such as typical end user web marketing interactions.

ORE is currently more suitable for implementations where a somewhat higher latency is acceptable.