1.3 Embedded R Execution
Embedded R Execution enables users to run User-defined functions (UDFs) R functions using R engines spawned by the database environment. This feature provides functions that automate the process of loading database data into R functions and allows for the return of both structured and unstructured results, including images. Users can store and manage their R functions in the database script repository.
Embedded R Execution:
Both Oracle Database and Oracle Autonomous Database support the creation and deployment of Embedded R Execution UDFs, which are custom functions written in R. These functions can be invoked directly from R and SQL queries. Additionally, Oracle Autonomous Database extends this functionality to REST APIs, providing greater integration options with external applications and workflows.
Note:
Oracle Database can leverage REST APIs using Oracle REST Data Services (ORDS), enabling web-based access to in-database models.Third-Party Packages:
Both Oracle Database and Oracle Autonomous Database allow users to leverage third-party packages in R or in UDFs called from R and SQL interfaces. Oracle Database provides access to third-party packages through R's native package management tools. Oracle Autonomous Database provides access to third-party packages through Conda environments and extends this functionality to REST APIs, offering greater flexibility for machine learning workflows.
Deployment and Usage
Embedded R Execution facilitates the deployment of user-defined R functions in production environments. These functions can be run using a SQL interface and, on Autonomous Database, through a REST interface for seamless integration into applications.
Oracle Database supports three primary types of functions for parallel processing:
- Non-Parallel Invocation:
- doEval: Runs a function in a single R engine and returns results, which can include data.frames and images. Runs a user-defined function for each row of a table or result set.
- tableApply: Runs a function in a single R engine, passing database data referenced by proxy objects as an R data.frame. Applies a user-defined function to an entire table or result set.
- Data Parallelism:
- rowApply: Implements data parallelism by processing data in chunks of rows using potentially multiple R engines. Applies a user-defined function to each row of a table or result set in parallel.
- groupApply: Enables data parallelism by partitioning database data by specified columns and applying user-defined R functions to each partition using potentially multiple R engines. Applies a user-defined function to each group of rows in a table or result set in parallel.
- Task Parallelism:
- indexApply: Facilitates task parallelism by running a user-defined R function multiple times with different index values using potentially multiple R engines. Applies a user-defined function to a range of indexes in parallel.
To manipulate database tables or views within a user-defined function (UDF), leverage the transparency layer. This feature is essential when embedded execution APIs, such as ore.doEval and ore.indexApply, do not directly load tables or views into R engines. In these cases, data interactions occur transparently between the R environment and the database. For UDF operations outside the transparency layer's scope, calculations are run within the R engine. Conversely, functions like ore.tableApply
, ore.groupApply
, and ore.rowApply
explicitly transfer data from the database to the R engine as data frames for in-memory processing.
For UDFs that do not directly load tables or views into R engines, the transparency layer ensures seamless data transfer between the R environment and the database. Calculations are performed within the R engine for UDF operations outside the transparency layer's capabilities. Embedded R Execution functions such as ore.tableApply
, ore.groupApply
, and ore.rowApply
explicitly transfer data from the database to the R engine as data frames for in-memory processing.
Example: Using transparency layers in embedded R UDFs
ore.doEval(function(){
test <- ore.pull("TEST") # transparency layer
dim(test) # R
}
ore.doEval(function(){
test <- ore.sync("TEST") # transparency layer
ore.attach(test) # transparency layer
dim(test) # transparency layer, proxy object
}
The code test <- ore.pull("TEST")
retrieves data from the database table named TEST
using the transparency layer function ore.pull
. The fetched data is then stored in the variable test
.
Using Embedded R Execution
To use these capabilities, user-defined R functions are stored in the database script repository. On Autonomous Database, the connection is established automatically without requiring an explicit connection step.
Example 1-1 Building a Linear Model using xyplot
In this example, the buildLM.1
function uses xyplot
from the lattice
library to create a scatter plot.
buildLM.1 <- function(dat){ library(lattice) regr <- lm(Petal.Length~Petal.Width, dat) x <- dat[['Petal.Width']] y <- dat[['Petal.Length']] print(xyplot(y~x, xlab = "Petal Width", ylab = "Petal Length", panel = function(x,y) { panel.dotplot(x,y) panel.abline(lm(y ~ x)) }, xlim=c(0,2.7), ylim=c(0,7.5), main="Prediction of Petal Length")) return(regr) }
%r buildLM.1(iris_df)Listing for Example
Call:
lm(formula = Petal.Length ~ Petal.Width, data = dat)
Coefficients:
(Intercept) Petal.Width
1.084 2.230
The tableApply
function accepts the proxy object IRIS
as input data and passes it to the user-defined function as a data frame. The user-defined function is provided as a string. Observe that the model is returned as an OML object, confirming that it is a linear model.
%r buildLM.1 <- function(dat,dsname){ library(lattice) regr <- lm(Petal.Length~Petal.Width, dat) x <- dat[['Petal.Width']] y <- dat[['Petal.Length']] xyplot(y~x, xlab = "Petal Width", ylab = "Petal Length", panel = function(x,y) { panel.dotplot(x,y) panel.abline(lm(y ~ x)) }, xlim=c(0,2.7), ylim=c(0,7.5), main="Prediction of Petal Length") return(regr) }
%r mod <- ore.tableApply(IRIS, buildLM.1, dsname = 'ds-00') cat("Type :",class(mod)) cat("\n") cat("Coefficient :",(mod$coefficient))Listing for Example
Type : lm Coefficient : 1.083558 2.22994
You can now save this user-defined function in the script repository and subsequently run it by specifying the function name.
%r ore.scriptCreate(name = 'buildLM.1', FUN = buildLM.1, overwrite = TRUE) ore.scriptList(name = "buildLM.1")Listing for Example
A data.frame: 1 x 2
NAME SCRIPT
<chr> <chr>
buildLM.1 function (dat)
{
library(lattice)
regr <- lm(Petal.Length ~ Petal.Width, dat)
x <- dat[["Petal.Width"]]
y <- dat[["Petal.Length"]]
print(xyplot(y ~ x, xlab = "Petal Width", ylab = "Petal Length", panel = function(x, y) {
panel.dotplot(x, y)
panel.abline(lm(y ~ x))
}, xlim = c(0, 2.7), ylim = c(0, 7.5), main = "Prediction of Petal Length"))
return(regr)
}
Parent topic: About Oracle Machine Learning for R