2 About Oracle Machine Learning for R

The following topics describe Oracle Machine Learning for R (OML4R) and its advantages for the R user.

Note:

OML4R in previous releases was named Oracle R Enterprise.

2.1 What Is Oracle Machine Learning for R?

Oracle Machine Learning for R (OML4R) is a comprehensive, database-centric environment for end-to-end analytical processes in R, with immediate deployment to production environments.

OML4R is a set of R packages and database features that enable an R user to operate on database-resident data without using SQL and to run R scripts in one or more embedded R engines that run in the database environment.

Using OML4R from your R session, you have easy access to data in an database instance. You can create and use R objects that correspond to database tables and views - referred to as proxy objects data in database tables. OML4R has overloaded functions that translate R operations into SQL that runs in the database. The database consolidates the SQL and can use the query optimization, parallel processing, and scalability features of the database when it runs the SQL statements. The database returns the results as R objects.

OML4R 2.0 is available in the R interpreter in Oracle Machine Learning Notebooks in your Oracle Autonomous Database instance. For more information, see Get Started with Notebooks for Data Analysis and Data Visualization. In this environment, all the required components are included, including R, required R libraries, and the R interpreter in OML Notebooks.

Embedded R execution provides some of the most significant advantages of using OML4R. Using Embedded R execution, you can store and run R scripts in the database through either an R interface or a SQL interface or both. The Embedded R can be run on Autonomous Database using the REST API. You can use the results of R scripts in SQL-enabled tools for structured data, R objects, and images.

2.2 Advantages of Oracle Machine Learning for R

Using OML4R to prepare and analyze data in an Oracle Database instance has many advantages for an R user.

With OML4R, you can do the following:

  • Operate on Database-Resident Data Without Using SQL. OML4R transparently translates many standard R functions into SQL. With OML4R, you can create R proxy objects that access, analyze, and manipulate data that resides in the database. OML4R can automatically optimize the SQL by taking advantage of column indexes, query optimization, table partitioning, and database parallelism.

    OML4R overloaded functions are available for many commonly used R functions, including those on R data frames for in-database execution.

  • Minimize Data Movement. By keeping the data in the database whenever possible, you eliminate the time involved in transferring the data to your client R engine and the need to store the data locally. You also eliminate the need to manage the locally stored data, which includes tasks such as distributing the data files to the appropriate locations, synchronizing the data with changes that are made in the production database, and so on.

  • Keep Data Secure. By keeping the data in the database, you have the security, scalability, reliability, and backup features of Oracle Database for managing the data.

  • Use the Power of the Database. By operating directly on data in the database, you can use the memory and processing power of the database environment and avoid the memory constraints of your client R engine.

  • Use Current Data. As data is refreshed in the database, you have immediate access to current data.

  • Prepare Data in the Database. Using the transparency layer functions, prepare large database-resident data sets for predictive analysis through operations such as ordering, aggregating, filtering, recoding, and the use of comprehensive sampling techniques without having to write SQL code.

  • Save R Objects in the Database. You can save native R objects and OML4R proxy objects in the OML4R datastore for use across R sessions and to share with other users. You can store R and OML4R objects in an OML4R datastore, which is managed by the Oracle Database instance.

  • Build Models in the Database. Use in-database parallelized and distributed machine learning algorithms from OML4SQL to build more models on more data, and score large volume data – faster. Use in-database algorithms from OML4SQL via well-integrated R API, which now includes in-database algorithms for Neural Networks, Random Forest, Exponential Smooth, and XGBoost. Increase productivity from in-database algorithm automatic data preparation, partitioned models, and integrated text mining capabilities.

    You can use functions in packages that you download from CRAN (The Comprehensive R Archive Network) to build models that use techniques such as ensemble modeling.

  • Run user-defined R functions in embedded R engines. Using OML4R Embedded R execution functionality, you can store user-defined R functions in the OML4R script repository, and run those functions in R engines spawned by the database environment. When a user-defined R function runs, the database spawns and manages one or more R engines that can run in parallel. With the Embedded R execution functionality, you can do the following:

    • Use a select set of R packages in user-defined functions that run in embedded R engines.

    • Use other R packages in user-defined R functions that run in Embedded R engines.

    • Operationalize user-defined R functions for use in production applications and eliminate porting R code and models into other languages; avoid reinventing code to integrate R results into existing applications.

    • Seamlessly leverage your Oracle database instance as a high-performance computing environment for user-defined R functions, providing data parallelism and resource management.

    • Perform simulations, for example, Monte Carlo analysis, using system-supported task parallelism.

    • Call user-defined R functions from R, SQL, and REST APIs.

    • Return structured data.frame results and PNG images from user-defined R functions as tables as well as XML containing both structured and image content.

  • Integrate with the Oracle Technology Stack. You can take advantage of all aspects of the Oracle technology stack to integrate your data analysis within a larger framework for business intelligence or scientific inquiry. For example, you can integrate the results of your OML4R analysis into Oracle Analytics Cloud and Oracle APEX using SQL to call R functionality.

2.3 Get Online Help for Oracle Machine Learning for R Classes, Functions, and Methods

To get help on OML4R classes, functions, and methods, use R functions such as help and showMethods. If the name of a class or function has an ore prefix, you can supply the name to the help function. To get help on an overloaded method of an open-source R function, supply the name of the method and the name of the ore class.

Example 2-1 Getting Help on OML4R Classes, Functions, and Methods

This example shows several ways of getting information on OML4R classes, functions, and methods. In the listing following the example some code has been modified to display only a portion of the results and the output of some of the functions is not shown.

# List the contents of the OREbase package.
ls("package:OREbase")

# Get help for the OREbase package.
help("OREbase")

# Get help for the ore virtual class.
help("ore-class")

# Show the subclasses of the ore virtual class.
showClass("ore")

# Get help on the ore.frame class.
help("ore.frame")

# Get help on the ore.vector class.
help("ore.vector")

# Show the arguments for the aggregate method.
showMethods("aggregate")

# Get help on the aggregate method for an ore.vector object.
help("aggregate,ore.vector-method")

# Show the signatures for the merge method.
showMethods("merge")

# Get help on the merge method for an ore.frame object.
help("merge,ore.frame,ore.frame-method")

showMethods("scale")

# Get help on the scale method for an ore.number object.
help("scale,ore.number-method")

# Get help on the ore.connect function.
help("ore.connect")
Listing for Example 2-1
R> options(width = 80)
# List the contents of the OREbase package.
R> head(ls("package:OREbase"), 12)
 [1] "%in%"          "Arith"         "Compare"       "I"            
 [5] "Logic"         "Math"          "NCOL"          "NROW"         
 [9] "Summary"       "as.data.frame" "as.env"        "as.factor"
R>
R># Get help for the OREbase package.
R> help("OREbase")      # Output not shown.
R>
R> # Get help for the ore virtual class.
R> help("ore-class")    # Output not shown.
R>
R># Show the subclasses of the ore virtual class.
R> showClass("ore")
Virtual Class "ore" [package "OREbase"]
 
No Slots, prototype of class "ore.vector"
 
Known Subclasses: 
Class "ore.vector", directly
Class "ore.frame", directly
Class "ore.matrix", directly
Class "ore.number", by class "ore.vector", distance 2
Class "ore.character", by class "ore.vector", distance 2
Class "ore.factor", by class "ore.vector", distance 2
Class "ore.date", by class "ore.vector", distance 2
Class "ore.datetime", by class "ore.vector", distance 2
Class "ore.difftime", by class "ore.vector", distance 2
Class "ore.logical", by class "ore.vector", distance 3
Class "ore.integer", by class "ore.vector", distance 3
Class "ore.numeric", by class "ore.vector", distance 3
Class "ore.tblmatrix", by class "ore.matrix", distance 2
Class "ore.vecmatrix", by class "ore.matrix", distance 2
R>
# Get help on the ore.frame class.
R> help("ore.frame")       # Output not shown.

R># Get help on the ore.vector class.
R> help("ore.vector")    # Output not shown.
R>
R># Show the arguments for the aggregate method.
R> showMethods("aggregate")
Function: aggregate (package stats)
x="ANY"
x="ore.vector"

# Get help on the aggregate method for an ore.vector object.
R> help("aggregate,ore.vector-method")      # Output not shown.

# Show the signatures for the merge method.
R> showMethods("merge")
Function: merge (package base)
x="ANY", y="ANY"
x="data.frame", y="ore.frame"
x="ore.frame", y="data.frame"
x="ore.frame", y="ore.frame

# Get help on the merge method for an ore.frame object.
R> help("merge,ore.frame,ore.frame-method")  # Output not shown.

R> showMethods("scale")
Function: scale (package base)
x="ANY"
x="ore.frame"
x="ore.number"
x="ore.tblmatrix"
x="ore.vecmatrix"

# Get help on the scale method for an ore.number object.
R> help("scale,ore.number-method")    # Output not shown.

# Get help on the ore.connect function.
R> help("ore.connect")                # Output not shown.

2.4 About Transparently Using R on Oracle Database Data

OML4R has overloaded open source R functions that you can use to operate directly on data in Oracle Database and Oracle Autonomous Database instances.

The methods and functions are in packages that implement a transparency layer that translates R functions into SQL.

The OML4R transparency layer packages and the limitations of converting R into SQL are described in the following topics:

2.4.1 About the Transparency Layer

The Oracle Machine Learning for R transparency layer is implemented by the OREbase, OREgraphics, OREdplyr and OREstats packages.

These OML4R packages contain overloaded methods of functions in the open source R base, graphics, and stats packages, respectively.

With the methods and functions in these packages, you can create data.frame and other R proxy objects that correspond to tables and views in your Oracle Database instance. When you run an R expression that uses such an object, the method or function transparently generates a SQL query and sends it to the database. The database then runs the query and typically returns the result also as a proxy object to minimize data movement between the data and the client, as some results may be large.

A database table or view is represented by an ore.frame object, which is a subclass of data.frame. Other OML4R classes inherit from corresponding R classes, such as ore.vector inheriting from vector. OML4R maps Oracle Database data types to OML4R classes, such as NUMBER to the complete mapping list.

You can use the transparency layer methods and functions to prepare database-resident data for analysis. You can then use functions in other OML4R packages to build models and use them to score data.

Example 2-2 Finding the Mean of the Petal Lengths by Species in R

This example shows creating a table called IRIS_TABLE from the R data frame iris. Then using the overloaded aggregate function to compute the mean of petal length for each species.

ore.create(iris, table = 'IRIS_TABLE')
aggplen = aggregate(IRIS_TABLE$Petal.Length,
                    by = list(species = IRIS_TABLE$Species),
                    FUN = mean)
aggplen
Listing for This Example
R> ore.create(iris, table = 'IRIS_TABLE')
R> aggplen = aggregate(IRIS_TABLE$Petal.Length,
                       by = list(species = IRIS_TABLE$Species),
                       FUN = mean)
R> aggplen
              species     x
setosa         setosa 1.462
versicolor versicolor 4.260
virginica   virginica 5.552

Example 2-3 SQL Equivalent of the Previous Example

This example shows the SQL equivalent of the aggregate function in the previous example.

SELECT "Species", AVG("Petal.Length")
FROM IRIS_TABLE
GROUP BY "Species"
ORDER BY "Species";
 
Species     AVG("PETAL.LENGTH") 
----------- ------------------- 
setosa       1.4620000000000002 
versicolor   4.26
virginica    5.552

2.4.2 Transparency Layer Support for R Data Types and Classes

Oracle Machine Learning for R transparency layer has classes and data types that map R data types to Oracle Database data types.

Those classes and data types are described in the following topics:

2.4.2.1 About Oracle Machine Learning for R Data Types and Classes

OML4R has data types that map R data types to SQL data types.

In an R session, when you create database objects from R objects or you create R objects from database data, OML4R translates R data types to SQL data types and the reverse where possible.

OML4R creates objects that are instances of OML4R classes. OML4R overloads many standard R functions so that they use OML4R classes and data types. R language constructs and syntax are supported for objects that are mapped to Oracle Database objects.

Table 2-1 Mappings Between R, OML4R, and SQL Data Types

R Data Type OML4R Data Type SQL Data Type

character mode vector

ore.character

VARCHAR2

INTERVAL YEAR TO MONTH

integer mode vector

ore.integer

NUMBER

logical mode vector

ore.logical

The NUMBER 0 for FALSE and 1 for TRUE

numeric mode vector

ore.number

BINARY_DOUBLE

BINARY_FLOAT

FLOAT

NUMBER

Date

ore.date

DATE

POSIXct

POSIXlt

ore.datetime

TIMESTAMP

TIMESTAMP WITH TIME ZONE

TIMESTAMP WITH LOCAL TIME ZONE

difftime

ore.difftime

INTERVAL DAY TO SECOND

None

Not supported

LONG

LONG RAW

RAW

User defined data types

Reference data types

Note:

  • Objects of type ore.datetime do not support a time zone setting, instead they use the system time zone Sys.timezone if it is available or GMT if Sys.timezone is not available.

  • The SQL VARCHAR2 data type is mapped to the R character data type through the embedded R input data argument. Users can convert the character variable to a factor in R if needed by using as.factor().

2.4.2.2 About the ore.frame Class

It is the OML4R equivalent of a data.frame. Typically, you get ore.frame objects that are proxies for database tables. You can then add new columns, or make other changes, to the ore.frame proxy object. Any such change does not affect the underlying table. If you then request data from the source table of the ore.frame object, the transparency layer function generates a SQL query that has the additional columns in the select list, but the table is not changed.

In R, the elements of a data.frame have an explicit order. You can specify elements by using integer indexing. In contrast, relational database tables do not define any order of rows and therefore cannot be directly mapped to R data structures.

OML4R has both ordered and unordered ore.frame objects. If a table has a primary key, which is a set of one or more columns that form a distinct tuple within a row, you can produce ordered results by performing a sort using an ORDER BY clause in a SELECT statement. However, ordering relational data can be expensive and is often unnecessary for transparency layer operations. For example, ordering is not required to compute summary statistics when invoking the summary function on an ore.frame.

See Also:

"Create Temporary Database Tables " for information on ore.create

Example 2-4 Classes of a data.frame and a Corresponding ore.frame

This example creates a data.frame with columns that contain different data types and displays the structure of the data.frame. The example then calls the ore.push function to create a temporary table in the database that contains a copy of the data of the data.frame. The ore.push invocation also generates an ore.frame object that is a proxy for the table. The example displays the classes of the ore.frame object and of the columns in the data.frame and the ore.frame objects.

df <- data.frame(a="abc",
                 b=1.456,
                 c=TRUE,
                 d=as.integer(1),
                 e=Sys.Date(),
                 f=as.difftime(c("0:3:20", "11:23:15")))
ore.push(df)
class(df$a)
class(df$b)
class(df$c)
class(df$d)
class(df$e)
class(df$f)
Listing for Example
R> df <- data.frame(a="abc",
+                   b=1.456,
+                   c=TRUE,
+                   d=as.integer(1),
+                   e=Sys.Date(),
+                   f=as.difftime(c("0:3:20", "11:23:15")))
R> ore.push(df)
R> class(df$a)
[1] "character"
R> class(df$b)
[1] "numeric"
R> class(df$c)
[1] "logical"
R> class(df$d)
[1] "integer"
R> class(df$e)
[1] "Date"
R> class(df$f)
[1] "difftime"
2.4.2.3 Support for R Naming Conventions

OML4R uses R naming conventions for ore.frame columns instead of the more restrictive Oracle Database naming conventions.

The column names of an ore.frame can be longer than 30 bytes, can contain double quotes, and can be non-unique.

2.4.2.4 About Coercing R and Oracle Machine Learning for R Class Types

Some OML4R functions coerce R objects and class types to OML4R ore objects and types.

The generic as.ore function coerces in-memory R objects to ore objects. The more specific functions, such as as.ore.character, coerce objects to specific types. The ore.push function implicitly coerces R class types to ore class types and the ore.pull function coerces ore class types to R class types. For information on those functions, see "Create Temporary Database Tables".

Example 2-5 Coercing R and OML4R Class Types

This example illustrates coercing R objects to ore objects. creates an R integer object and then uses the generic method as.ore to coerce it to an ore object, which is an ore.integer. The example coerces the R object to various other ore class types. For an example of using as.factor in Embedded R Execution function, see Example 10-9.

x <- 1:10
class(x)
X <- as.ore(x)
class(X)
Xn <- as.ore.numeric(x)
class(Xn)
Xc <- as.ore.character(x)
class(Xc)
Xc
Xf <- as.ore.factor(x)
Xf
Listing for Example 2-5
R> x <- 1:10
R> class(x)
[1] "integer"
R> X <- as.ore(x)
R> class(X)
[1] "ore.integer"
attr(,"package")
[1] "OREbase"
R> Xn <- as.ore.numeric(x)
R> class(Xn)
[1] "ore.numeric"
attr(,"package")
[1] "OREbase"
R> Xc <- as.ore.character(x)
R> class(Xc)
[1] "ore.character"
attr(,"package")
[1] "OREbase"
R> Xc
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
R> Xf <- as.ore.factor(x)
R> Xf
 [1] 1  2  3  4  5  6  7  8  9  10
Levels: 1 10 2 3 4 5 6 7 8 9

2.5 Oracle Machine Learning for R Global Options

OML4R has global options that affect various functions.

Table 2-2 lists the OML4R global options and descriptions of them.

Table 2-2 OML4R Global Options

Global Description

ore.envAsEmptyenv

A logical value that specifies whether an environment referenced in an object should be replaced with an empty environment during serialization to an Oracle Database. When TRUE, the referenced environment in the object is replaced with an empty environment whose parent is .GlobalEnv, and the objects in the original referenced environment are not serialized. In some cases, this can significantly reduce the size of serialized objects. When FALSE, all of the objects in the referenced environment are serialized, and can be unserialized and loaded into memory. The default value for this option is FALSE.

The following OML4R functions use this global option:

  • ore.push, in saving a serialized list object to the database

  • ore.save, in saving objects to an OML4R datastore

  • ore.doEval and the other Embedded R Execution functions for serializing parameters of list type and for serializing some objects returned by an R function during Embedded R Execution

ore.na.extract

A logical value used during logical subscripting of an ore.frame or ore.vector object. When TRUE, rows or elements with an NA logical subscript produce rows or elements with NA values, which mimics how R treats missing value logical subscripting of data.frame and vector objects.

When FALSE, an NA logical subscript is interpreted as a FALSE value, resulting in the removal of the corresponding row or element. The default value is FALSE.

ore.parallel

A preferred degree of parallelism to use in Embedded R Execution. One of the following:

  • A positive integer greater than or equal to 2 for a specific degree of parallelism

  • FALSE or 1 for no parallelism

  • TRUE for the default parallelism of the data argument

  • NULL for the database default for the operation

The default value is NULL.

ore.sep

A character string that specifies the separator to use between multi-column row names of an ore.frame. The default value is |.

ore.trace

A logical value that specifies whether iterative OML4R functions should print output at each iteration. The default value is FALSE.

ore.warn.order

A logical value that specifies whether OML4R displays a warning message when an ore.frame that lacks row names or an ore.vector that lacks element names is used in a function that requires ordering. The default value is TRUE.