1.3 Embedded R Execution

Embedded R Execution enables users to run User-defined functions (UDFs) R functions using R engines spawned by the database environment. This feature provides functions that automate the process of loading database data into R functions and allows for the return of both structured and unstructured results, including images. Users can store and manage their R functions in the database script repository.

Embedded R Execution:

Both Oracle Database and Oracle Autonomous Database support the creation and deployment of Embedded R Execution UDFs, which are custom functions written in R. These functions can be invoked directly from R and SQL queries. Additionally, Oracle Autonomous Database extends this functionality to REST APIs, providing greater integration options with external applications and workflows.

Note:

Oracle Database can leverage REST APIs using Oracle REST Data Services (ORDS), enabling web-based access to in-database models.

Third-Party Packages:

Both Oracle Database and Oracle Autonomous Database allow users to leverage third-party packages in R or in UDFs called from R and SQL interfaces. Oracle Database provides access to third-party packages through R's native package management tools. Oracle Autonomous Database provides access to third-party packages through Conda environments and extends this functionality to REST APIs, offering greater flexibility for machine learning workflows.

Deployment and Usage

Embedded R Execution facilitates the deployment of user-defined R functions in production environments. These functions can be run using a SQL interface and, on Autonomous Database, through a REST interface for seamless integration into applications.

Oracle Database supports three primary types of functions for parallel processing:

  • Non-Parallel Invocation:
    • doEval: Runs a function in a single R engine and returns results, which can include data.frames and images. Runs a user-defined function for each row of a table or result set.
    • tableApply: Runs a function in a single R engine, passing database data referenced by proxy objects as an R data.frame. Applies a user-defined function to an entire table or result set.
  • Data Parallelism:
    • rowApply: Implements data parallelism by processing data in chunks of rows using potentially multiple R engines. Applies a user-defined function to each row of a table or result set in parallel.
    • groupApply: Enables data parallelism by partitioning database data by specified columns and applying user-defined R functions to each partition using potentially multiple R engines. Applies a user-defined function to each group of rows in a table or result set in parallel.
  • Task Parallelism:
    • indexApply: Facilitates task parallelism by running a user-defined R function multiple times with different index values using potentially multiple R engines. Applies a user-defined function to a range of indexes in parallel.

To manipulate database tables or views within a user-defined function (UDF), leverage the transparency layer. This feature is essential when embedded execution APIs, such as ore.doEval and ore.indexApply, do not directly load tables or views into R engines. In these cases, data interactions occur transparently between the R environment and the database. For UDF operations outside the transparency layer's scope, calculations are run within the R engine. Conversely, functions like ore.tableApply, ore.groupApply, and ore.rowApply explicitly transfer data from the database to the R engine as data frames for in-memory processing.

For UDFs that do not directly load tables or views into R engines, the transparency layer ensures seamless data transfer between the R environment and the database. Calculations are performed within the R engine for UDF operations outside the transparency layer's capabilities. Embedded R Execution functions such as ore.tableApply, ore.groupApply, and ore.rowApply explicitly transfer data from the database to the R engine as data frames for in-memory processing.

Example: Using transparency layers in embedded R UDFs

ore.doEval(function(){
    test <- ore.pull("TEST") # transparency layer
    dim(test) # R
  }
 
ore.doEval(function(){
    test <- ore.sync("TEST") # transparency layer
    ore.attach(test) # transparency layer
    dim(test) # transparency layer, proxy object
}

The code test <- ore.pull("TEST") retrieves data from the database table named TEST using the transparency layer function ore.pull. The fetched data is then stored in the variable test.

Using Embedded R Execution

To use these capabilities, user-defined R functions are stored in the database script repository. On Autonomous Database, the connection is established automatically without requiring an explicit connection step.

Example 1-1 Building a Linear Model using xyplot

In this example, the buildLM.1 function uses xyplot from the lattice library to create a scatter plot.

 
buildLM.1 <- function(dat){
  library(lattice)
   
  regr <- lm(Petal.Length~Petal.Width, dat)
 
  x <- dat[['Petal.Width']]
  y <- dat[['Petal.Length']]
 
  print(xyplot(y~x, xlab = "Petal Width", ylab = "Petal Length",
    panel = function(x,y) {
    panel.dotplot(x,y)
    panel.abline(lm(y ~ x))
    },
    xlim=c(0,2.7),
    ylim=c(0,7.5),
    main="Prediction of Petal Length"))
     
     
    return(regr)
}
%r
 
buildLM.1(iris_df)
Listing for Example
Call:
lm(formula = Petal.Length ~ Petal.Width, data = dat)
 
Coefficients:
(Intercept)  Petal.Width 
      1.084        2.230

Figure 1-2 Prediction of Petal Length



The tableApply function accepts the proxy object IRIS as input data and passes it to the user-defined function as a data frame. The user-defined function is provided as a string. Observe that the model is returned as an OML object, confirming that it is a linear model.

%r
 
buildLM.1 <- function(dat,dsname){
  library(lattice)
   
  regr <- lm(Petal.Length~Petal.Width, dat)
 
  x <- dat[['Petal.Width']]
  y <- dat[['Petal.Length']]
 
  xyplot(y~x, xlab = "Petal Width", ylab = "Petal Length",
    panel = function(x,y) {
    panel.dotplot(x,y)
    panel.abline(lm(y ~ x))
    },
    xlim=c(0,2.7),
    ylim=c(0,7.5),
    main="Prediction of Petal Length")
     
  return(regr)
}
%r
 
mod <- ore.tableApply(IRIS,
                      buildLM.1,
                      dsname = 'ds-00')
 
cat("Type :",class(mod))
cat("\n")
cat("Coefficient :",(mod$coefficient))
Listing for Example
Type : lm
Coefficient : 1.083558 2.22994

You can now save this user-defined function in the script repository and subsequently run it by specifying the function name.

%r
 
ore.scriptCreate(name = 'buildLM.1',
                 FUN  = buildLM.1,    
                 overwrite = TRUE)
                  
ore.scriptList(name = "buildLM.1")
Listing for Example
A data.frame: 1 x 2
NAME	SCRIPT
<chr>	<chr>
buildLM.1	function (dat) 
{
    library(lattice)
    regr <- lm(Petal.Length ~ Petal.Width, dat)
    x <- dat[["Petal.Width"]]
    y <- dat[["Petal.Length"]]
    print(xyplot(y ~ x, xlab = "Petal Width", ylab = "Petal Length", panel = function(x, y) {
        panel.dotplot(x, y)
        panel.abline(lm(y ~ x))
    }, xlim = c(0, 2.7), ylim = c(0, 7.5), main = "Prediction of Petal Length"))
    return(regr)
}