1 Introduction to Oracle Machine Learning for R
Lists topics that introduce Oracle Machine Learning for R (OML4R).
OML4R in previous releases was named Oracle R Enterprise. The following topics introduce OML4R:
- About Oracle Machine Learning for R
Oracle Machine Learning for R (OML4R) is a comprehensive, database-centric environment for end-to-end analytical processes in R, with immediate deployment to production environments. - Advantages of Oracle Machine Learning for R
Using OML4R to prepare and analyze data in an Oracle Database instance has many advantages for an R user. - Get Online Help for Oracle Machine Learning for R Classes, Functions, and Methods
The OML4R client packages contain the R components that you use to interact with data in an Oracle database. - About Transparently Using R on Oracle Database Data
OML4R has overloaded open source R methods and functions that you can use to operate directly on data in an Oracle Database instance. - Typical Operations in Using Oracle Machine Learning for R
In using OML4R, the following is a typical progression of operations: - Oracle Machine Learning for R Global Options
OML4R has global options that affect various functions.
1.1 About Oracle Machine Learning for R
Oracle Machine Learning for R (OML4R) is a comprehensive, database-centric environment for end-to-end analytical processes in R, with immediate deployment to production environments.
OML4R is a set of R packages and Oracle Database features that enable an R user to operate on database-resident data without using SQL and to execute R scripts in one or more embedded R engines that run on the database server.
Using OML4R from your local R session, you have easy access to data in an Oracle Database instance. You can create and use R objects that specify data in database tables. OML4R has overloaded functions that translate R operations into SQL that executes in the database. The database consolidates the SQL and can use the query optimization, parallel processing, and scalability features of the database when it executes the SQL statements. The database returns the results as R objects.
Embedded R execution provides some of the most significant advantages of using OML4R. Using embedded R execution, you can store and run R scripts in the database through either an R interface or a SQL interface or both. You can use the results of R scripts in SQL-enabled tools for structured data, R objects, and images.
Parent topic: Introduction to Oracle Machine Learning for R
1.2 Advantages of Oracle Machine Learning for R
Using OML4R to prepare and analyze data in an Oracle Database instance has many advantages for an R user.
With OML4R, you can do the following:
-
Operate on Database-Resident Data Without Using SQL. OML4R has overloaded open source R methods and functions that transparently convert standard R syntax into SQL. These methods and functions are in packages that implement the OML4R transparency layer. With these functions and methods, you can create R objects that access, analyze, and manipulate data that resides in the database. The database can automatically optimize the SQL to improve the efficiency of the query.
-
Eliminate Data Movement. By keeping the data in the database, you eliminate the time involved in transferring the data to your desktop computer and the need to store the data locally. You also eliminate the need to manage the locally stored data, which includes tasks such as distributing the data files to the appropriate locations, synchronizing the data with changes that are made in the production database, and so on.
-
Keep Data Secure. By keeping the data in the database, you have the security, scalability, reliability, and backup features of Oracle Database for managing the data.
-
Use the Power of the Database. By operating directly on database-resident data, you can use the memory and processing power of the database and avoid the memory constraints of your client R session.
-
Use Current Data. As data is refreshed in the database, you have immediate access to current data.
-
Prepare Data in the Database. Using the transparency layer functions, prepare large database-resident data sets for predictive analysis through operations such as ordering, aggregating, filtering, recoding, and the use of comprehensive sampling techniques without having to write SQL code.
-
Save R Objects in the Database. You can save R objects in an Oracle Database instance as persistent database objects that are available to others. You can store R and OML4R objects in an OML4R datastore, which is managed by the Oracle Database instance.
-
Build Models in the Database. You can build models in the database and store and manage them in an OML4R datastore. You can use functions in packages that you download from CRAN (The Comprehensive R Archive Network) to build models that require large amounts of memory and that use techniques such as ensemble modeling.
-
Score Data in the Database. You can include your R models in scripts to score database-resident data. You can perform tasks such as the following:
-
Go from model building to scoring in one step because you can use the same R code for scoring. You do not need to translate the scoring logic as required by some standalone analytic servers.
-
Schedule scripts to be run automatically to perform tasks such as bulk scoring.
-
Score data in the context of a transaction.
-
Perform online what-if scoring.
-
Optionally convert a model to SQL, which Oracle Database does automatically for you. You can then deploy the resulting SQL for low-latency scoring tasks.
-
-
Execute R Scripts in the Database. Using OML4R embedded R execution functionality, you can create, store, and execute R scripts in the database. When the script executes, Oracle Database starts, controls, and manages one or more R engines that can run in parallel on the database server. By executing scripts on the database server, you can take advantage of scalability and performance of the server.
With the embedded R execution functionality, you can do the following:
-
Develop and test R scripts interactively and make the scripts available for use by SQL applications
-
Use CRAN and other packages in R scripts on the database server
-
Operationalize entire R scripts in production applications and eliminate porting R code; avoid reinventing code to integrate R results into existing applications
-
Seamlessly leverage Oracle Database as a high performance computing (HPC) environment for R scripts, providing data parallelism and resource management
-
Use the processing and memory resources of Oracle Database and the increased efficiency of read/write operations between the database and the embedded R execution R engines
-
Use the parallel processing capabilities of the database for data-parallel or task-parallel operations
-
Perform parallel simulations
-
Generate XML and PNG images that can be used by R or SQL applications
-
-
Integrate with the Oracle Technology Stack. You can take advantage of all aspects of the Oracle technology stack to integrate your data analysis within a larger framework for business intelligence or scientific inquiry. For example, you can integrate the results of your OML4R analysis into Oracle Business Intelligence Enterprise Edition (OBIEE).
Parent topic: Introduction to Oracle Machine Learning for R
1.3 Get Online Help for Oracle Machine Learning for R Classes, Functions, and Methods
The OML4R client packages contain the R components that you use to interact with data in an Oracle database.
For a list and brief descriptions of the client packages, and for information on installing them, see Oracle Machine Learning for R Installation and Administration Guide.
To get help on OML4R classes, functions, and methods, use R functions such as help
and showMethods
. If the name of a class or function has an ore
prefix, you can supply the name to the help
function. To get help on an overloaded method of an open-source R function, supply the name of the method and the name of the ore
class.
Example 1-1 Getting Help on OML4R Classes, Functions, and Methods
This example shows several ways of getting information on OML4R classes, functions, and methods. In the listing following the example some code has been modified to display only a portion of the results and the output of some of the functions is not shown.
# List the contents of the OREbase package. ls("package:OREbase") # Get help for the OREbase package. help("OREbase") # Get help for the ore virtual class. help("ore-class") # Show the subclasses of the ore virtual class. showClass("ore") # Get help on the ore.frame class. help("ore.frame") # Get help on the ore.vector class. help("ore.vector") # Show the arguments for the aggregate method. showMethods("aggregate") # Get help on the aggregate method for an ore.vector object. help("aggregate,ore.vector-method") # Show the signatures for the merge method. showMethods("merge") # Get help on the merge method for an ore.frame object. help("merge,ore.frame,ore.frame-method") showMethods("scale") # Get help on the scale method for an ore.number object. help("scale,ore.number-method") # Get help on the ore.connect function. help("ore.connect")
R> options(width = 80) # List the contents of the OREbase package. R> head(ls("package:OREbase"), 12) [1] "%in%" "Arith" "Compare" "I" [5] "Logic" "Math" "NCOL" "NROW" [9] "Summary" "as.data.frame" "as.env" "as.factor" R> R># Get help for the OREbase package. R> help("OREbase") # Output not shown. R> R> # Get help for the ore virtual class. R> help("ore-class") # Output not shown. R> R># Show the subclasses of the ore virtual class. R> showClass("ore") Virtual Class "ore" [package "OREbase"] No Slots, prototype of class "ore.vector" Known Subclasses: Class "ore.vector", directly Class "ore.frame", directly Class "ore.matrix", directly Class "ore.number", by class "ore.vector", distance 2 Class "ore.character", by class "ore.vector", distance 2 Class "ore.factor", by class "ore.vector", distance 2 Class "ore.date", by class "ore.vector", distance 2 Class "ore.datetime", by class "ore.vector", distance 2 Class "ore.difftime", by class "ore.vector", distance 2 Class "ore.logical", by class "ore.vector", distance 3 Class "ore.integer", by class "ore.vector", distance 3 Class "ore.numeric", by class "ore.vector", distance 3 Class "ore.tblmatrix", by class "ore.matrix", distance 2 Class "ore.vecmatrix", by class "ore.matrix", distance 2 R> # Get help on the ore.frame class. R> help("ore.frame") # Output not shown. R># Get help on the ore.vector class. R> help("ore.vector") # Output not shown. R> R># Show the arguments for the aggregate method. R> showMethods("aggregate") Function: aggregate (package stats) x="ANY" x="ore.vector" # Get help on the aggregate method for an ore.vector object. R> help("aggregate,ore.vector-method") # Output not shown. # Show the signatures for the merge method. R> showMethods("merge") Function: merge (package base) x="ANY", y="ANY" x="data.frame", y="ore.frame" x="ore.frame", y="data.frame" x="ore.frame", y="ore.frame # Get help on the merge method for an ore.frame object. R> help("merge,ore.frame,ore.frame-method") # Output not shown. R> showMethods("scale") Function: scale (package base) x="ANY" x="ore.frame" x="ore.number" x="ore.tblmatrix" x="ore.vecmatrix" # Get help on the scale method for an ore.number object. R> help("scale,ore.number-method") # Output not shown. # Get help on the ore.connect function. R> help("ore.connect") # Output not shown.
Parent topic: Introduction to Oracle Machine Learning for R
1.4 About Transparently Using R on Oracle Database Data
OML4R has overloaded open source R methods and functions that you can use to operate directly on data in an Oracle Database instance.
The methods and functions are in packages that implement a transparency layer that translates R functions into SQL.
The OML4R transparency layer packages and the limitations of converting R into SQL are described in the following topics:
- About the Transparency Layer
The Oracle Machine Learning for R transparency layer is implemented by theOREbase
,OREgraphics
, andOREstats
packages. - Transparency Layer Support for R Data Types and Classes
Oracle Machine Learning for R transparency layer has classes and data types that map R data types to Oracle Database data types.
Parent topic: Introduction to Oracle Machine Learning for R
1.4.1 About the Transparency Layer
The Oracle Machine Learning for R transparency layer is implemented by the OREbase
, OREgraphics
, and OREstats
packages.
These OML4R packages contain overloaded methods of functions in the open source R base
, graphics
, and stats
packages, respectively. The OML4R packages also contain OML4R versions of some of the open source R functions.
With the methods and functions in these packages, you can create R objects that specify data in an Oracle Database instance. When you execute an R expression that uses such an object, the method or function transparently generates a SQL query and sends it to the database. The database then executes the query and returns the results of the operation as an R object.
A database table or view is represented by an ore.frame
object, which is a subclass of data.frame
. Other OML4R classes inherit from corresponding R classes, such as ore.vector
and vector
. OML4R maps Oracle Database data types to OML4R classes, such as NUMBER
to ore.integer
.
You can use the transparency layer methods and functions to prepare database-resident data for analysis. You can then use functions in other OML4R packages to build and fit models and use them to score data. For large data sets, you can do the modeling and scoring using R engines embedded in Oracle Database.
See Also:
-
"Transparency Layer Support for R Data Types and Classes" for information on OML4R data types and object mappings and on the correspondences between R, OML4R, and SQL data types and objects
Example 1-2 Finding the Mean of the Petal Lengths by Species in R
This example illustrates the translation of an R function invocation into SQL. It uses the overloaded OML4R aggregate
function to get the mean of the petal lengths from the IRIS_TABLE
object.
ore.create(iris, table = 'IRIS_TABLE') aggplen = aggregate(IRIS_TABLE$Petal.Length, by = list(species = IRIS_TABLE$Species), FUN = mean) aggplenListing for This Example
R> ore.create(iris, table = 'IRIS_TABLE') R> aggplen = aggregate(IRIS_TABLE$Petal.Length, by = list(species = IRIS_TABLE$Species), FUN = mean) R> aggplen species x setosa setosa 1.462 versicolor versicolor 4.260 virginica virginica 5.552
Example 1-3 SQL Equivalent of the Previous Example
This example shows the SQL equivalent of the aggregate
function in the previous example.
SELECT "Species", AVG("Petal.Length") FROM IRIS_TABLE GROUP BY "Species" ORDER BY "Species"; Species AVG("PETAL.LENGTH") ----------- ------------------- setosa 1.4620000000000002 versicolor 4.26 virginica 5.552
Parent topic: About Transparently Using R on Oracle Database Data
1.4.2 Transparency Layer Support for R Data Types and Classes
Oracle Machine Learning for R transparency layer has classes and data types that map R data types to Oracle Database data types.
Those classes and data types are described in the following topics:
- About Oracle Machine Learning for R Data Types and Classes
OML4R has data types that map R data types to SQL data types. - About the ore.frame Class
Anore.frame
object represents a relational query for an Oracle Database instance. - Support for R Naming Conventions
OML4R uses R naming conventions forore.frame
columns instead of the more restrictive Oracle Database naming conventions. - About Coercing R and Oracle Machine Learning for R Class Types
Some OML4R functions coerce R objects and class types to OML4Rore
objects and types.
Parent topic: About Transparently Using R on Oracle Database Data
1.4.2.1 About Oracle Machine Learning for R Data Types and Classes
OML4R has data types that map R data types to SQL data types.
In an R session, when you create database objects from R objects or you create R objects from database data, OML4R translates R data types to SQL data types and the reverse where possible.
OML4R creates objects that are instances of OML4R classes. OML4R overloads many standard R functions so that they use OML4R classes and data types. R language constructs and syntax are supported for objects that are mapped to Oracle Database objects.
Table 1-1 Mappings Between R, OML4R, and SQL Data Types
R Data Type | OML4R Data Type | SQL Data Type |
---|---|---|
character mode |
|
|
integer mode |
|
|
logical mode |
|
The |
numeric mode |
|
|
|
|
|
|
|
|
|
|
|
None |
Not supported |
User defined data types Reference data types |
Note:
-
Objects of type
ore.datetime
do not support a time zone setting, instead they use the system time zoneSys.timezone
if it is available or GMT ifSys.timezone
is not available. -
The SQL VARCHAR2 data type is mapped to the R character data type through the embedded R input data argument. Users can convert the character variable to a factor in R if needed by using
as.factor()
.
1.4.2.2 About the ore.frame Class
An ore.frame
object represents a relational query for an Oracle Database instance.
It is the OML4R equivalent of a data.frame
. Typically, you get ore.frame
objects that are proxies for database tables. You can then add new columns, or make other changes, to the ore.frame
proxy object. Any such change does not affect the underlying table. If you then request data from the source table of the ore.frame
object, the transparency layer function generates a SQL query that has the additional columns in the select list, but the table is not changed.
In R, the elements of a data.frame
have an explicit order. You can specify elements by using integer indexing. In contrast, relational database tables do not define any order of rows and therefore cannot be directly mapped to R data structures.
OML4R has both ordered and unordered ore.frame
objects. If a table has a primary key, which is a set of one or more columns that form a distinct tuple within a row, you can produce ordered results by performing a sort using an ORDER BY
clause in a SELECT
statement. However, ordering relational data can be expensive and is often unnecessary for transparency layer operations. For example, ordering is not required to compute summary statistics when invoking the summary
function on an ore.frame
.
See Also:
"Moving Data to and from the Database" for information on ore.create
Example 1-4 Classes of a data.frame and a Corresponding ore.frame
This example creates a data.frame
with columns that contain different data types and displays the structure of the data.frame
. The example then invokes the ore.push
function to create a temporary table in the database that contains a copy of the data of the data.frame
. The ore.push
invocation also generates an ore.frame
object that is a proxy for the table. The example displays the classes of the ore.frame
object and of the columns in the data.frame
and the ore.frame
objects.
df <- data.frame(a="abc", b=1.456, c=TRUE, d=as.integer(1), e=Sys.Date(), f=as.difftime(c("0:3:20", "11:23:15"))) ore.push(df) class(of) class(df$a class(of$a) class(df$b) class(of$b) class(df$c) class(of$c) class(df$d) class(of$d) class(df$e) class(of$e) class(df$f) class(of$f)Listing for Example 1-4
R> df <- data.frame(a="abc", + b=1.456, + c=TRUE, + d=as.integer(1), + e=Sys.Date(), + f=as.difftime(c("0:3:20", "11:23:15"))) R> ore.push(df) R> class(of) [1] "ore.frame" attr(,"package") [1] "OREbase" R> class(df$a) [1] "factor" R> class(of$a) [1] "ore.factor" attr(,"package") [1] "OREbase" R> class(df$b) [1] "numeric" R> class(of$b) [1] "ore.numeric" attr(,"package") [1] "OREbase" R> class(df$c) [1] "logical" R> class(of$c) [1] "ore.logical" attr(,"package") [1] "OREbase" R> class(df$d) [1] "integer" R> class(of$d) [1] "ore.integer" attr(,"package") [1] "OREbase" R> class(df$e) [1] "Date" R> class(of$e) [1] "ore.date" attr(,"package") [1] "OREbase" R> class(df$f) [1] "difftime" R> class(of$f) [1] "ore.difftime" attr(,"package") [1] "OREbase"
Parent topic: Transparency Layer Support for R Data Types and Classes
1.4.2.3 Support for R Naming Conventions
OML4R uses R naming conventions for ore.frame
columns instead of the more restrictive Oracle Database naming conventions.
The column names of an ore.frame
can be longer than 30 bytes, can contain double quotes, and can be non-unique.
Parent topic: Transparency Layer Support for R Data Types and Classes
1.4.2.4 About Coercing R and Oracle Machine Learning for R Class Types
Some OML4R functions coerce R objects and class types to OML4R ore
objects and types.
The generic as.ore
function coerces in-memory R objects to ore
objects. The more specific functions, such as as.ore.character
, coerce objects to specific types. The ore.push
function implicitly coerces R class types to ore
class types and the ore.pull
function coerces ore
class types to R class types. For information on those functions, see "Moving Data to and from the Database".
Example 1-5 Coercing R and OML4R Class Types
This example illustrates coercing R objects to ore
objects. creates an R integer
object and then uses the generic method as.ore
to coerce it to an ore
object, which is an ore.integer
. The example coerces the R object to various other ore
class types. For an example of using as.factor
in embedded R execution function, see Example 6-13.
x <- 1:10 class(x) X <- as.ore(x) class(X) Xn <- as.ore.numeric(x) class(Xn) Xc <- as.ore.character(x) class(Xc) Xc Xf <- as.ore.factor(x) XfListing for Example 1-5
R> x <- 1:10 R> class(x) [1] "integer" R> X <- as.ore(x) R> class(X) [1] "ore.integer" attr(,"package") [1] "OREbase" R> Xn <- as.ore.numeric(x) R> class(Xn) [1] "ore.numeric" attr(,"package") [1] "OREbase" R> Xc <- as.ore.character(x) R> class(Xc) [1] "ore.character" attr(,"package") [1] "OREbase" R> Xc [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" R> Xf <- as.ore.factor(x) R> Xf [1] 1 2 3 4 5 6 7 8 9 10 Levels: 1 10 2 3 4 5 6 7 8 9
Parent topic: Transparency Layer Support for R Data Types and Classes
1.5 Typical Operations in Using Oracle Machine Learning for R
In using OML4R, the following is a typical progression of operations:
-
In an R session, connect to a schema in an Oracle Database instance.
-
Attach the schema and synchronize with the schema objects, which generates OML4R proxy objects for database tables.
-
Prepare the data for analysis and possibly perform exploratory data analysis and data visualization.
-
Build models using functions in the
OREmodels
orOREdm
packages. -
Score data using the models either in your local R session or by using embedded R execution.
-
Deploy the results of the analysis to end users.
Figure 1-1 Typical OML4R Workflow
This figure illustrates these steps and typical reiterations of them.
![Description of Figure 1-1 follows Description of Figure 1-1 follows](img/workflow2.png)
Description of "Figure 1-1 Typical OML4R Workflow"
Parent topic: Introduction to Oracle Machine Learning for R
1.6 Oracle Machine Learning for R Global Options
OML4R has global options that affect various functions.
Table 1-2 lists the OML4R global options and descriptions of them.
Table 1-2 OML4R Global Options
Global | Description |
---|---|
A logical value that specifies whether an environment referenced in an object should be replaced with an empty environment during serialization to an Oracle Database. When The following OML4R functions use this global option:
|
|
A logical value used during logical subscripting of an When |
|
A preferred degree of parallelism to use in embedded R execution. One of the following:
The default value is |
|
A character string that specifies the separator to use between multiple column row names of an |
|
A logical value that specifies whether iterative OML4R functions should print output at each iteration. The default value is |
|
A logical value that specifies whether OML4R displays a warning message when an |
See Also:
-
"Global Options Related to Ordering" for information on using
ore.sep
andore.warn.order
Parent topic: Introduction to Oracle Machine Learning for R