2 About Oracle Machine Learning for R
The following topics describe Oracle Machine Learning for R (OML4R) and its advantages for the R user.
Note:
OML4R in previous releases was named Oracle R Enterprise.- What Is Oracle Machine Learning for R?
Oracle Machine Learning for R (OML4R) is a comprehensive, database-centric environment for end-to-end analytical processes in R, with immediate deployment to production environments. - Advantages of Oracle Machine Learning for R
Using OML4R to prepare and analyze data in an Oracle Database instance has many advantages for an R user. - Get Online Help for Oracle Machine Learning for R Classes, Functions, and Methods
- About Transparently Using R on Oracle Database Data
OML4R has overloaded open source R functions that you can use to operate directly on data in Oracle Database and Oracle Autonomous Database instances. - Oracle Machine Learning for R Global Options
OML4R has global options that affect various functions.
2.1 What Is Oracle Machine Learning for R?
Oracle Machine Learning for R (OML4R) is a comprehensive, database-centric environment for end-to-end analytical processes in R, with immediate deployment to production environments.
OML4R is a set of R packages and database features that enable an R user to operate on database-resident data without using SQL and to run R scripts in one or more embedded R engines that run in the database environment.
Using OML4R from your R session, you have easy access to data in an database instance. You can create and use R objects that correspond to database tables and views - referred to as proxy objects data in database tables. OML4R has overloaded functions that translate R operations into SQL that runs in the database. The database consolidates the SQL and can use the query optimization, parallel processing, and scalability features of the database when it runs the SQL statements. The database returns the results as R objects.
OML4R 2.0 is available in the R interpreter in Oracle Machine Learning Notebooks in your Oracle Autonomous Database instance. For more information, see Get Started with Notebooks for Data Analysis and Data Visualization. In this environment, all the required components are included, including R, required R libraries, and the R interpreter in OML Notebooks.
Embedded R execution provides some of the most significant advantages of using OML4R. Using Embedded R execution, you can store and run R scripts in the database through either an R interface or a SQL interface or both. The Embedded R can be run on Autonomous Database using the REST API. You can use the results of R scripts in SQL-enabled tools for structured data, R objects, and images.
Parent topic: About Oracle Machine Learning for R
2.2 Advantages of Oracle Machine Learning for R
Using OML4R to prepare and analyze data in an Oracle Database instance has many advantages for an R user.
With OML4R, you can do the following:
-
Operate on Database-Resident Data Without Using SQL. OML4R transparently translates many standard R functions into SQL. With OML4R, you can create R proxy objects that access, analyze, and manipulate data that resides in the database. OML4R can automatically optimize the SQL by taking advantage of column indexes, query optimization, table partitioning, and database parallelism.
OML4R overloaded functions are available for many commonly used R functions, including those on R data frames for in-database execution.
-
Minimize Data Movement. By keeping the data in the database whenever possible, you eliminate the time involved in transferring the data to your client R engine and the need to store the data locally. You also eliminate the need to manage the locally stored data, which includes tasks such as distributing the data files to the appropriate locations, synchronizing the data with changes that are made in the production database, and so on.
-
Keep Data Secure. By keeping the data in the database, you have the security, scalability, reliability, and backup features of Oracle Database for managing the data.
-
Use the Power of the Database. By operating directly on data in the database, you can use the memory and processing power of the database environment and avoid the memory constraints of your client R engine.
-
Use Current Data. As data is refreshed in the database, you have immediate access to current data.
-
Prepare Data in the Database. Using the transparency layer functions, prepare large database-resident data sets for predictive analysis through operations such as ordering, aggregating, filtering, recoding, and the use of comprehensive sampling techniques without having to write SQL code.
-
Save R Objects in the Database. You can save native R objects and OML4R proxy objects in the OML4R datastore for use across R sessions and to share with other users. You can store R and OML4R objects in an OML4R datastore, which is managed by the Oracle Database instance.
-
Build Models in the Database. Use in-database parallelized and distributed machine learning algorithms from OML4SQL to build more models on more data, and score large volume data – faster. Use in-database algorithms from OML4SQL via well-integrated R API, which now includes in-database algorithms for Neural Networks, Random Forest, Exponential Smooth, and XGBoost. Increase productivity from in-database algorithm automatic data preparation, partitioned models, and integrated text mining capabilities.
You can use functions in packages that you download from CRAN (The Comprehensive R Archive Network) to build models that use techniques such as ensemble modeling.
-
Run user-defined R functions in embedded R engines. Using OML4R Embedded R execution functionality, you can store user-defined R functions in the OML4R script repository, and run those functions in R engines spawned by the database environment. When a user-defined R function runs, the database spawns and manages one or more R engines that can run in parallel. With the Embedded R execution functionality, you can do the following:
-
Use a select set of R packages in user-defined functions that run in embedded R engines.
-
Use other R packages in user-defined R functions that run in Embedded R engines.
-
Operationalize user-defined R functions for use in production applications and eliminate porting R code and models into other languages; avoid reinventing code to integrate R results into existing applications.
-
Seamlessly leverage your Oracle database instance as a high-performance computing environment for user-defined R functions, providing data parallelism and resource management.
-
Perform simulations, for example, Monte Carlo analysis, using system-supported task parallelism.
-
Call user-defined R functions from R, SQL, and REST APIs.
-
Return structured data.frame results and PNG images from user-defined R functions as tables as well as XML containing both structured and image content.
-
-
Integrate with the Oracle Technology Stack. You can take advantage of all aspects of the Oracle technology stack to integrate your data analysis within a larger framework for business intelligence or scientific inquiry. For example, you can integrate the results of your OML4R analysis into Oracle Analytics Cloud and Oracle APEX using SQL to call R functionality.
Parent topic: About Oracle Machine Learning for R
2.3 Get Online Help for Oracle Machine Learning for R Classes, Functions, and Methods
To get help on OML4R classes, functions, and methods, use R functions such as help
and showMethods
. If the name of a class or function has an ore
prefix, you can supply the name to the help
function. To get help on an overloaded method of an open-source R function, supply the name of the method and the name of the ore
class.
Example 2-1 Getting Help on OML4R Classes, Functions, and Methods
This example shows several ways of getting information on OML4R classes, functions, and methods. In the listing following the example some code has been modified to display only a portion of the results and the output of some of the functions is not shown.
# List the contents of the OREbase package. ls("package:OREbase") # Get help for the OREbase package. help("OREbase") # Get help for the ore virtual class. help("ore-class") # Show the subclasses of the ore virtual class. showClass("ore") # Get help on the ore.frame class. help("ore.frame") # Get help on the ore.vector class. help("ore.vector") # Show the arguments for the aggregate method. showMethods("aggregate") # Get help on the aggregate method for an ore.vector object. help("aggregate,ore.vector-method") # Show the signatures for the merge method. showMethods("merge") # Get help on the merge method for an ore.frame object. help("merge,ore.frame,ore.frame-method") showMethods("scale") # Get help on the scale method for an ore.number object. help("scale,ore.number-method") # Get help on the ore.connect function. help("ore.connect")
R> options(width = 80) # List the contents of the OREbase package. R> head(ls("package:OREbase"), 12) [1] "%in%" "Arith" "Compare" "I" [5] "Logic" "Math" "NCOL" "NROW" [9] "Summary" "as.data.frame" "as.env" "as.factor" R> R># Get help for the OREbase package. R> help("OREbase") # Output not shown. R> R> # Get help for the ore virtual class. R> help("ore-class") # Output not shown. R> R># Show the subclasses of the ore virtual class. R> showClass("ore") Virtual Class "ore" [package "OREbase"] No Slots, prototype of class "ore.vector" Known Subclasses: Class "ore.vector", directly Class "ore.frame", directly Class "ore.matrix", directly Class "ore.number", by class "ore.vector", distance 2 Class "ore.character", by class "ore.vector", distance 2 Class "ore.factor", by class "ore.vector", distance 2 Class "ore.date", by class "ore.vector", distance 2 Class "ore.datetime", by class "ore.vector", distance 2 Class "ore.difftime", by class "ore.vector", distance 2 Class "ore.logical", by class "ore.vector", distance 3 Class "ore.integer", by class "ore.vector", distance 3 Class "ore.numeric", by class "ore.vector", distance 3 Class "ore.tblmatrix", by class "ore.matrix", distance 2 Class "ore.vecmatrix", by class "ore.matrix", distance 2 R> # Get help on the ore.frame class. R> help("ore.frame") # Output not shown. R># Get help on the ore.vector class. R> help("ore.vector") # Output not shown. R> R># Show the arguments for the aggregate method. R> showMethods("aggregate") Function: aggregate (package stats) x="ANY" x="ore.vector" # Get help on the aggregate method for an ore.vector object. R> help("aggregate,ore.vector-method") # Output not shown. # Show the signatures for the merge method. R> showMethods("merge") Function: merge (package base) x="ANY", y="ANY" x="data.frame", y="ore.frame" x="ore.frame", y="data.frame" x="ore.frame", y="ore.frame # Get help on the merge method for an ore.frame object. R> help("merge,ore.frame,ore.frame-method") # Output not shown. R> showMethods("scale") Function: scale (package base) x="ANY" x="ore.frame" x="ore.number" x="ore.tblmatrix" x="ore.vecmatrix" # Get help on the scale method for an ore.number object. R> help("scale,ore.number-method") # Output not shown. # Get help on the ore.connect function. R> help("ore.connect") # Output not shown.
Parent topic: About Oracle Machine Learning for R
2.4 About Transparently Using R on Oracle Database Data
OML4R has overloaded open source R functions that you can use to operate directly on data in Oracle Database and Oracle Autonomous Database instances.
The methods and functions are in packages that implement a transparency layer that translates R functions into SQL.
The OML4R transparency layer packages and the limitations of converting R into SQL are described in the following topics:
- About the Transparency Layer
The Oracle Machine Learning for R transparency layer is implemented by theOREbase
,OREgraphics
,OREdplyr
andOREstats
packages. - Transparency Layer Support for R Data Types and Classes
Oracle Machine Learning for R transparency layer has classes and data types that map R data types to Oracle Database data types.
Parent topic: About Oracle Machine Learning for R
2.4.1 About the Transparency Layer
The Oracle Machine Learning for R transparency layer is implemented by the OREbase
, OREgraphics
, OREdplyr
and OREstats
packages.
These OML4R packages contain overloaded methods of functions in the open source R base
, graphics
, and stats
packages, respectively.
With the methods and functions in these packages, you can create data.frame and other R proxy objects that correspond to tables and views in your Oracle Database instance. When you run an R expression that uses such an object, the method or function transparently generates a SQL query and sends it to the database. The database then runs the query and typically returns the result also as a proxy object to minimize data movement between the data and the client, as some results may be large.
A database table or view is represented by an ore.frame
object, which is a subclass of data.frame
. Other OML4R classes inherit from corresponding R classes, such as ore.vector
inheriting from vector
. OML4R maps Oracle Database data types to OML4R classes, such as NUMBER
to the complete mapping list.
You can use the transparency layer methods and functions to prepare database-resident data for analysis. You can then use functions in other OML4R packages to build models and use them to score data.
Example 2-2 Finding the Mean of the Petal Lengths by Species in R
This example shows creating a table called IRIS_TABLE from the R data frame iris. Then using the overloaded aggregate function to compute the mean of petal length for each species.
ore.create(iris, table = 'IRIS_TABLE') aggplen = aggregate(IRIS_TABLE$Petal.Length, by = list(species = IRIS_TABLE$Species), FUN = mean) aggplenListing for This Example
R> ore.create(iris, table = 'IRIS_TABLE') R> aggplen = aggregate(IRIS_TABLE$Petal.Length, by = list(species = IRIS_TABLE$Species), FUN = mean) R> aggplen species x setosa setosa 1.462 versicolor versicolor 4.260 virginica virginica 5.552
Example 2-3 SQL Equivalent of the Previous Example
This example shows the SQL equivalent of the aggregate
function in the previous example.
SELECT "Species", AVG("Petal.Length") FROM IRIS_TABLE GROUP BY "Species" ORDER BY "Species"; Species AVG("PETAL.LENGTH") ----------- ------------------- setosa 1.4620000000000002 versicolor 4.26 virginica 5.552
Parent topic: About Transparently Using R on Oracle Database Data
2.4.2 Transparency Layer Support for R Data Types and Classes
Oracle Machine Learning for R transparency layer has classes and data types that map R data types to Oracle Database data types.
Those classes and data types are described in the following topics:
- About Oracle Machine Learning for R Data Types and Classes
OML4R has data types that map R data types to SQL data types. - About the ore.frame Class
- Support for R Naming Conventions
OML4R uses R naming conventions forore.frame
columns instead of the more restrictive Oracle Database naming conventions. - About Coercing R and Oracle Machine Learning for R Class Types
Some OML4R functions coerce R objects and class types to OML4Rore
objects and types.
Parent topic: About Transparently Using R on Oracle Database Data
2.4.2.1 About Oracle Machine Learning for R Data Types and Classes
OML4R has data types that map R data types to SQL data types.
In an R session, when you create database objects from R objects or you create R objects from database data, OML4R translates R data types to SQL data types and the reverse where possible.
OML4R creates objects that are instances of OML4R classes. OML4R overloads many standard R functions so that they use OML4R classes and data types. R language constructs and syntax are supported for objects that are mapped to Oracle Database objects.
Table 2-1 Mappings Between R, OML4R, and SQL Data Types
R Data Type | OML4R Data Type | SQL Data Type |
---|---|---|
character mode |
|
|
integer mode |
|
|
logical mode |
|
The |
numeric mode |
|
|
|
|
|
|
|
|
|
|
|
None |
Not supported |
User defined data types Reference data types |
Note:
-
Objects of type
ore.datetime
do not support a time zone setting, instead they use the system time zoneSys.timezone
if it is available or GMT ifSys.timezone
is not available. -
The SQL VARCHAR2 data type is mapped to the R character data type through the embedded R input data argument. Users can convert the character variable to a factor in R if needed by using
as.factor()
.
2.4.2.2 About the ore.frame Class
It is the OML4R equivalent of a data.frame
. Typically, you get ore.frame
objects that are proxies for database tables. You can then add new columns, or make other changes, to the ore.frame
proxy object. Any such change does not affect the underlying table. If you then request data from the source table of the ore.frame
object, the transparency layer function generates a SQL query that has the additional columns in the select list, but the table is not changed.
In R, the elements of a data.frame
have an explicit order. You can specify elements by using integer indexing. In contrast, relational database tables do not define any order of rows and therefore cannot be directly mapped to R data structures.
OML4R has both ordered and unordered ore.frame
objects. If a table has a primary key, which is a set of one or more columns that form a distinct tuple within a row, you can produce ordered results by performing a sort using an ORDER BY
clause in a SELECT
statement. However, ordering relational data can be expensive and is often unnecessary for transparency layer operations. For example, ordering is not required to compute summary statistics when invoking the summary
function on an ore.frame
.
See Also:
"Create Temporary Database Tables " for information on ore.create
Example 2-4 Classes of a data.frame and a Corresponding ore.frame
This example creates a data.frame
with columns that contain different data types and displays the structure of the data.frame
. The example then calls the ore.push
function to create a temporary table in the database that contains a copy of the data of the data.frame
. The ore.push
invocation also generates an ore.frame
object that is a proxy for the table. The example displays the classes of the ore.frame
object and of the columns in the data.frame
and the ore.frame
objects.
df <- data.frame(a="abc", b=1.456, c=TRUE, d=as.integer(1), e=Sys.Date(), f=as.difftime(c("0:3:20", "11:23:15"))) ore.push(df) class(df$a) class(df$b) class(df$c) class(df$d) class(df$e) class(df$f)Listing for Example
R> df <- data.frame(a="abc", + b=1.456, + c=TRUE, + d=as.integer(1), + e=Sys.Date(), + f=as.difftime(c("0:3:20", "11:23:15"))) R> ore.push(df) R> class(df$a) [1] "character" R> class(df$b) [1] "numeric" R> class(df$c) [1] "logical" R> class(df$d) [1] "integer" R> class(df$e) [1] "Date" R> class(df$f) [1] "difftime"
Parent topic: Transparency Layer Support for R Data Types and Classes
2.4.2.3 Support for R Naming Conventions
OML4R uses R naming conventions for ore.frame
columns instead of the more restrictive Oracle Database naming conventions.
The column names of an ore.frame
can be longer than 30 bytes, can contain double quotes, and can be non-unique.
Parent topic: Transparency Layer Support for R Data Types and Classes
2.4.2.4 About Coercing R and Oracle Machine Learning for R Class Types
Some OML4R functions coerce R objects and class types to OML4R ore
objects and types.
The generic as.ore
function coerces in-memory R objects to
ore
objects. The more specific functions, such as
as.ore.character
, coerce objects to specific types. The
ore.push
function implicitly coerces R class types to ore
class types and the ore.pull
function coerces ore
class types
to R class types. For information on those functions, see "Create Temporary Database Tables".
Example 2-5 Coercing R and OML4R Class Types
This example illustrates coercing R objects to ore
objects. creates an R integer
object and then uses the generic method as.ore
to coerce it to an ore
object, which is an ore.integer
. The example coerces the R object to various other ore
class types. For an example of using as.factor
in Embedded R Execution function, see Example 10-9.
x <- 1:10 class(x) X <- as.ore(x) class(X) Xn <- as.ore.numeric(x) class(Xn) Xc <- as.ore.character(x) class(Xc) Xc Xf <- as.ore.factor(x) XfListing for Example 2-5
R> x <- 1:10 R> class(x) [1] "integer" R> X <- as.ore(x) R> class(X) [1] "ore.integer" attr(,"package") [1] "OREbase" R> Xn <- as.ore.numeric(x) R> class(Xn) [1] "ore.numeric" attr(,"package") [1] "OREbase" R> Xc <- as.ore.character(x) R> class(Xc) [1] "ore.character" attr(,"package") [1] "OREbase" R> Xc [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" R> Xf <- as.ore.factor(x) R> Xf [1] 1 2 3 4 5 6 7 8 9 10 Levels: 1 10 2 3 4 5 6 7 8 9
Parent topic: Transparency Layer Support for R Data Types and Classes
2.5 Oracle Machine Learning for R Global Options
OML4R has global options that affect various functions.
Table 2-2 lists the OML4R global options and descriptions of them.
Table 2-2 OML4R Global Options
Global | Description |
---|---|
A logical value that specifies whether an environment referenced in an object should be replaced with an empty environment during serialization to an Oracle Database. When The following OML4R functions use this global option:
|
|
A logical value used during logical subscripting of an When |
|
A preferred degree of parallelism to use in Embedded R Execution. One of the following:
The default value is |
|
A character string that specifies the separator to use between multi-column row names of an |
|
A logical value that specifies whether iterative OML4R functions should print output at each iteration. The default value is |
|
A logical value that specifies whether OML4R displays a warning message when an |
Parent topic: About Oracle Machine Learning for R