This chapter introduces Oracle R Enterprise. The chapter contains the following topics:
Oracle R Enterprise is a component of the Oracle Advanced Analytics Option of Oracle Database Enterprise Edition. Oracle R Enterprise is comprehensive, database-centric environment for end-to-end analytical processes in R, with immediate deployment to production environments. It is a set of R packages and Oracle Database features that enable an R user to operate on database-resident data without using SQL and to execute R scripts in one or more embedded R engines that run on the database server.
Using Oracle R Enterprise from your local R session, you have easy access to data in an Oracle Database instance. You can create and use R objects that specify data in database tables. Oracle R Enterprise has overloaded functions that translate R operations into SQL that executes in the database. The database consolidates the SQL and can use the query optimization, parallel processing, and scalability features of the database when it executes the SQL statements. The database returns the results as R objects.
Embedded R execution provides some of the most significant advantages of using Oracle R Enterprise. Using embedded R execution, you can store and run R scripts in the database through either an R interface or a SQL interface or both. You can use the results of R scripts in SQL-enabled tools for structured data, R objects, and images.
See Also:
"Advantages of Oracle R Enterprise"Using Oracle R Enterprise to prepare and analyze data in an Oracle Database instance has many advantages for an R user. With Oracle R Enterprise, you can do the following:
Operate on Database-Resident Data Without Using SQL. Oracle R Enterprise has overloaded open source R methods and functions that transparently convert standard R syntax into SQL. These methods and functions are in packages that implement the Oracle R Enterprise transparency layer. With these functions and methods, you can create R objects that access, analyze, and manipulate data that resides in the database. The database can automatically optimize the SQL to improve the efficiency of the query.
Eliminate Data Movement. By keeping the data in the database, you eliminate the time involved in transferring the data to your desktop computer and the need to store the data locally. You also eliminate the need to manage the locally stored data, which includes tasks such as distributing the data files to the appropriate locations, synchronizing the data with changes that are made in the production database, and so on.
Keep Data Secure. By keeping the data in the database, you have the security, scalability, reliability, and backup features of Oracle Database for managing the data.
Use the Power of the Database. By operating directly on database-resident data, you can use the memory and processing power of the database and avoid the memory constraints of your client R session.
Use Current Data. As data is refreshed in the database, you have immediate access to current data.
Prepare Data in the Database. Using the transparency layer functions, prepare large database-resident data sets for predictive analysis through operations such as ordering, aggregating, filtering, recoding, and the use of comprehensive sampling techniques without having to write SQL code.
Save R Objects in the Database. You can save R objects in an Oracle Database instance as persistent database objects that are available to others. You can store R and Oracle R Enterprise objects in an Oracle R Enterprise datastore, which is managed by the Oracle database.
Build Models in the Database. You can build models in the database and store and manage them in an Oracle R Enterprise datastore. You can use functions in packages that you download from CRAN (The Comprehensive R Archive Network) to build models that require large amounts of memory and that use techniques such as ensemble modeling.
Score Data in the Database. You can include your R models in scripts to score database-resident data. You can perform tasks such as the following:
Go from model building to scoring in one step because you can use the same R code for scoring. You do not need to translate the scoring logic as required by some standalone analytic servers.
Schedule scripts to be run automatically to perform tasks such as bulk scoring.
Score data in the context of a transaction.
Perform online what-if scoring.
Optionally convert a model to SQL, which Oracle Database does automatically for you. You can then deploy the resulting SQL for low-latency scoring tasks.
Execute R Scripts in the Database. Using Oracle R Enterprise embedded R execution functionality, you can create, store, and execute R scripts in the database. When the script executes, Oracle Database starts, controls, and manages one or more R engines that can run in parallel on the database server. By executing scripts on the database server, you can take advantage of scalability and performance of the server.
With the embedded R execution functionality, you can do the following:
Develop and test R scripts interactively and make the scripts available for use by SQL applications
Use CRAN and other packages in R scripts on the database server
Operationalize entire R scripts in production applications and eliminate porting R code; avoid reinventing code to integrate R results into existing applications
Seamlessly leverage Oracle Database as a high performance computing (HPC) environment for R scripts, providing data parallelism and resource management
Use the processing and memory resources of Oracle Database and the increased efficiency of read/write operations between the database and the embedded R execution R engines
Use the parallel processing capabilities of the database for data-parallel or task-parallel operations
Perform parallel simulations
Generate XML and PNG images that can be used by R or SQL applications
Integrate with the Oracle Technology Stack. You can take advantage of all aspects of the Oracle technology stack to integrate your data analysis within a larger framework for business intelligence or scientific inquiry. For example, you can integrate the results of your Oracle R Enterprise analysis into Oracle Business Intelligence Enterprise Edition (OBIEE).
The Oracle R Enterprise client packages contain the R components that you use to interact with data in an Oracle database. For a list and brief descriptions of the client packages, see Oracle R Enterprise Installation and Administration Guide.
To get help on Oracle R Enterprise classes, functions, and methods, use R functions such as help
and showMethods
. If the name of a class or function has an ore
prefix, you can supply the name to the help
function. To get help on an overloaded method of an open-source R function, supply the name of the method and the name of the ore
class.
Example 1-1 has several examples of getting information on Oracle R Enterprise classes, functions, and methods. In the listing following the example some code has been modified to display only a portion of the results and the output of some of the functions is not shown.
Example 1-1 Getting Help on Oracle R Enterprise Classes, Functions, and Methods
# List the contents of the OREbase package. ls("package:OREbase") # Get help for the OREbase package. help("OREbase") # Get help for the ore virtual class. help("ore-class") # Show the subclasses of the ore virtual class. showClass("ore") # Get help on the ore.frame class. help("ore.frame") # Get help on the ore.vector class. help("ore.vector") # Show the arguments for the aggregate method. showMethods("aggregate") # Get help on the aggregate method for an ore.vector object. help("aggregate,ore.vector-method") # Show the signatures for the merge method. showMethods("merge") # Get help on the merge method for an ore.frame object. help("merge,ore.frame,ore.frame-method") showMethods("scale") # Get help on the scale method for an ore.number object. help("scale,ore.number-method") # Get help on the ore.connect function. help("ore.connect")
Listing for Example 1-1
R> options(width = 80) # List the contents of the OREbase package. R> head(ls("package:OREbase"), 12) [1] "%in%" "Arith" "Compare" "I" [5] "Logic" "Math" "NCOL" "NROW" [9] "Summary" "as.data.frame" "as.env" "as.factor" R> R># Get help for the OREbase package. R> help("OREbase") # Output not shown. R> R> # Get help for the ore virtual class. R> help("ore-class") # Output not shown. R> R># Show the subclasses of the ore virtual class. R> showClass("ore") Virtual Class "ore" [package "OREbase"] No Slots, prototype of class "ore.vector" Known Subclasses: Class "ore.vector", directly Class "ore.frame", directly Class "ore.matrix", directly Class "ore.number", by class "ore.vector", distance 2 Class "ore.character", by class "ore.vector", distance 2 Class "ore.factor", by class "ore.vector", distance 2 Class "ore.date", by class "ore.vector", distance 2 Class "ore.datetime", by class "ore.vector", distance 2 Class "ore.difftime", by class "ore.vector", distance 2 Class "ore.logical", by class "ore.vector", distance 3 Class "ore.integer", by class "ore.vector", distance 3 Class "ore.numeric", by class "ore.vector", distance 3 Class "ore.tblmatrix", by class "ore.matrix", distance 2 Class "ore.vecmatrix", by class "ore.matrix", distance 2 R> # Get help on the ore.frame class. R> help("ore.frame") # Output not shown. R># Get help on the ore.vector class. R> help("ore.vector") # Output not shown. R> R># Show the arguments for the aggregate method. R> showMethods("aggregate") Function: aggregate (package stats) x="ANY" x="ore.vector" # Get help on the aggregate method for an ore.vector object. R> help("aggregate,ore.vector-method") # Output not shown. # Show the signatures for the merge method. R> showMethods("merge") Function: merge (package base) x="ANY", y="ANY" x="data.frame", y="ore.frame" x="ore.frame", y="data.frame" x="ore.frame", y="ore.frame # Get help on the merge method for an ore.frame object. R> help("merge,ore.frame,ore.frame-method") # Output not shown. R> showMethods("scale") Function: scale (package base) x="ANY" x="ore.frame" x="ore.number" x="ore.tblmatrix" x="ore.vecmatrix" # Get help on the scale method for an ore.number object. R> help("scale,ore.number-method") # Output not shown. # Get help on the ore.connect function. R> help("ore.connect") # Output not shown.
From an R session, you can view the Oracle R Enterprise documentation in HTML or PDF formats by invoking the OREShowDoc
function, as shown in Example 1-2. The function starts a browser that displays the Oracle documentation library for this release.
See Also:
Oracle R Enterprise Installation and Administration Guide for information on installing the Oracle R Enterprise client packagesOracle R Enterprise has overloaded open source R methods and functions that you can use to operate directly on data in an Oracle Database instance. The methods and functions are in packages that implement a transparency layer that translates R functions into SQL.
The Oracle R Enterprise transparency layer packages and the limitations of converting R into SQL are described in the following topics:
The Oracle R Enterprise transparency layer is implemented by the OREbase
, OREgraphics
, and OREstats
packages. These Oracle R Enterprise packages contain overloaded methods of functions in the open source R base
, graphics
, and stats
packages, respectively. The Oracle R Enterprise packages also contain Oracle R Enterprise versions of some of the open source R functions.
With the methods and functions in these packages, you can create R objects that specify data in an Oracle Database instance. When you execute an R expression that uses such an object, the method or function transparently generates a SQL query and sends it to the database. The database then executes the query and returns the results of the operation as an R object.
A database table or view is represented by an ore.frame
object, which is a subclass of data.frame
. Other Oracle R Enterprise classes inherit from corresponding R classes, such as ore.vector
and vector
. Oracle R Enterprise maps Oracle Database data types to Oracle R Enterprise classes, such as NUMBER
to ore.integer
. For more information on Oracle R Enterprise data types and object mappings, see "Transparency Layer Support for R Data Types and Classes".
Example 1-3 illustrates the translation of an R function invocation into SQL. It uses the overloaded Oracle R Enterprise aggregate
function to get the mean of the petal lengths from the IRIS_TABLE
object from Example 1-3.
Example 1-3 Finding the Mean of the Petal Lengths by Species in R
aggplen = aggregate(IRIS_TABLE$Petal.Length, by = list(species = IRIS_TABLE$Species), FUN = mean) aggplen
Listing for Example 1-3
R> aggplen = aggregate(IRIS_TABLE$Petal.Length, by = list(species = IRIS_TABLE$Species), FUN = mean) R> aggplen species x setosa setosa 1.462 versicolor versicolor 4.260 virginica virginica 5.552
Example 1-4 shows the SQL equivalent of the aggregate
function in Example 1-3.
Example 1-4 SQL Equivalent of Example 1-3
SELECT "Species", AVG("Petal.Length") FROM IRIS_TABLE GROUP BY "Species" ORDER BY "Species"; Species AVG("PETAL.LENGTH") ----------- ------------------- setosa 1.4620000000000002 versicolor 4.26 virginica 5.552
You can use the transparency layer methods and functions to prepare database-resident data for analysis. You can then use functions in other Oracle R Enterprise packages to build and fit models and use them to score data. For large data sets, you can do the modeling and scoring using R engines embedded in Oracle Database.
See Also:
"Transparency Layer Support for R Data Types and Classes" for information on the correspondences between R, Oracle R Enterprise, and SQL data types and objects
Oracle R Enterprise transparency layer has classes and data types that map R data types to Oracle Database data types. Those classes and data types are described in the following topics:
Oracle R Enterprise has data types that map R data types to SQL data types. In an R session, when you create database objects from R objects or you create R objects from database data, Oracle R Enterprise translates R data types to SQL data types and the reverse where possible. See Table 1-1 for a list of data type mappings.
Oracle R Enterprise creates objects that are instances of Oracle R Enterprise classes. Oracle R Enterprise overloads many standard R functions so that they use Oracle R Enterprise classes and data types.
R language constructs and syntax are supported for objects that are mapped to Oracle Database objects. For information on the R operators and functions that are supported by Oracle R Enterprise, see Appendix A.
Table 1-1 lists the mappings between R, Oracle R Enterprise, and SQL data types.
Table 1-1 Mappings Between R, Oracle R Enterprise, and SQL Data Types
R Data Type | Oracle R Enterprise Data Type | SQL Data Type |
---|---|---|
character mode |
|
|
integer mode |
|
|
logical mode |
|
The |
numeric mode |
|
|
|
|
|
|
|
|
|
|
|
Not supported |
User defined data types Reference data types |
Note:
Objects of typeore.datetime
do not support a time zone setting, instead they use the system time zone Sys.timezone
if it is available or GMT if Sys.timezone
is not available.An ore.frame
object represents a relational query for an Oracle Database instance. It is the Oracle R Enterprise equivalent of a data.frame
. Typically, you get ore.frame
objects that are proxies for database tables. You can then add new columns, or make other changes, to the ore.frame
proxy object. Any such change does not affect the underlying table. If you then request data from the source table of the ore.frame
object, the transparency layer function generates a SQL query that has the additional columns in the select list, but the table is not changed.
In R, the elements of a data.frame
have an explicit order. You can specify elements by using integer indexing. In contrast, relational database tables do not define any order of rows and therefore cannot be directly mapped to R data structures.
If a table has a primary key, which is a set of one or more columns that form a distinct tuple within a row, you can produce ordered results by performing a sort using an ORDER BY
clause in a SELECT
statement. However, ordering relational data can be expensive and is often unnecessary for transparency layer operations. For example, ordering is not required to compute summary statistics when invoking the summary
function on an ore.frame
.
Oracle R Enterprise has both ordered and unordered ore.frame
objects. For information on creating and using these objects, see "Creating Ordered and Unordered ore.frame Objects".
Example 1-5 creates a data.frame
with columns that contain different data types and displays the structure of the data.frame
. The example then invokes the ore.push
function to create a temporary table in the database that contains a copy of the data of the data.frame
. The ore.push
invocation also generates an ore.frame
object that is a proxy for the table. The example displays the classes of the ore.frame
object and of the columns in the data.frame
and the ore.frame
objects.
Example 1-5 Classes of a data.frame and a Corresponding ore.frame
df <- data.frame(a="abc", b=1.456, c=TRUE, d=as.integer(1), e=Sys.Date(), f=as.difftime(c("0:3:20", "11:23:15"))) ore.push(df) class(of) class(df$a class(of$a) class(df$b) class(of$b) class(df$c) class(of$c) class(df$d) class(of$d) class(df$e) class(of$e) class(df$f) class(of$f)
Listing for Example 1-5
R> df <- data.frame(a="abc", + b=1.456, + c=TRUE, + d=as.integer(1), + e=Sys.Date(), + f=as.difftime(c("0:3:20", "11:23:15"))) R> ore.push(df) R> class(of) [1] "ore.frame" attr(,"package") [1] "OREbase" R> class(df$a) [1] "factor" R> class(of$a) [1] "ore.factor" attr(,"package") [1] "OREbase" R> class(df$b) [1] "numeric" R> class(of$b) [1] "ore.numeric" attr(,"package") [1] "OREbase" R> class(df$c) [1] "logical" R> class(of$c) [1] "ore.logical" attr(,"package") [1] "OREbase" R> class(df$d) [1] "integer" R> class(of$d) [1] "ore.integer" attr(,"package") [1] "OREbase" R> class(df$e) [1] "Date" R> class(of$e) [1] "ore.date" attr(,"package") [1] "OREbase" R> class(df$f) [1] "difftime" R> class(of$f) [1] "ore.difftime" attr(,"package") [1] "OREbase"
Oracle R Enterprise uses R naming conventions for ore.frame
columns instead of the more restrictive Oracle Database naming conventions. The column names of an ore.frame
can be longer than 30 bytes, can contain double quotes, and can be non-unique.
The generic as.ore
function coerces in-memory R objects to ore
objects. The more specific functions, such as as.ore.character
, coerce objects to specific types. The ore.push
function implicitly coerces R class types to ore
class types and the ore.pull
function coerces ore
class types to R class types. For information on those functions, see "Moving Data to and from the Database".
Example 1-6 illustrates coercing R objects to ore
objects. creates an R integer
object and then uses the generic method as.ore
to coerce it to an ore
object, which is an ore.integer
. The example coerces the R object to various other ore
class types. For an example of using as.factor
in embedded R execution function, see Example 6-12, "Using the ore.groupApply Function".
Example 1-6 Coercing R and Oracle R Enterprise Class Types
x <- 1:10 class(x) X <- as.ore(x) class(X) Xn <- as.ore.numeric(x) class(Xn) Xc <- as.ore.character(x) class(Xc) Xc Xf <- as.ore.factor(x) Xf
Listing for Example 1-6
R> x <- 1:10 R> class(x) [1] "integer" R> X <- as.ore(x) R> class(X) [1] "ore.integer" attr(,"package") [1] "OREbase" R> Xn <- as.ore.numeric(x) R> class(Xn) [1] "ore.numeric" attr(,"package") [1] "OREbase" R> Xc <- as.ore.character(x) R> class(Xc) [1] "ore.character" attr(,"package") [1] "OREbase" R> Xc [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" R> Xf <- as.ore.factor(x) R> Xf [1] 1 2 3 4 5 6 7 8 9 10 Levels: 1 10 2 3 4 5 6 7 8 9
In using Oracle R Enterprise, the following is a typical progression of operations:
In an R session, connect to a schema in an Oracle Database instance.
Attach the schema and synchronize with the schema objects, which generates Oracle R Enterprise proxy objects for database tables.
Prepare the data for analysis and possibly perform exploratory data analysis and data visualization.
Build models using functions in the OREmodels
or OREdm
packages.
Score data using the models either in your local R session or by using embedded R execution.
Deploy the results of the analysis to end users.
Figure 1-1 illustrates these steps and typical reiterations of them.
Figure 1-1 Typical Oracle R Enterprise Workflow
Chapter 2, "Getting Started with Oracle R Enterprise" describes the following operations:
Connecting to a database.
Creating Oracle R Enterprise proxy objects for database tables.
Moving data from a data.frame
in your local R session to a database table, represented by an ore.frame
proxy object, and the reverse.
Chapter 3, "Preparing and Exploring Data in the Database" describes preparing data for analysis and exploring data. Preparing and exploring data may include operations such as the following:
Selecting data from a data set or table.
Cleaning the data by filtering out unneeded information.
Ordering the data.
Intermediate aggregations of data.
Time-series analysis.
Recoding or formatting of data.
Exploratory data analysis.
Chapter 4, "Building Models in Oracle R Enterprise" describes building models, including Oracle Data Mining models, using functions in the OREmodels
and OREdm
packages.
Chapter 5, "Predicting With R Models" describes using the ore.predict
function on Oracle R Enterprise models.
Chapter 6, "Using Oracle R Enterprise Embedded R Execution" describes how to create and execute R scripts in one or more R engines that run on the database server, and how to save those scripts in the Oracle Database R script repository.
Oracle R Enterprise has global options that affect various functions. Table 1-2 lists the Oracle R Enterprise global options and descriptions of them.
Table 1-2 Oracle R Enterprise Global Options
Global | Description |
---|---|
A logical value that specifies whether an environment referenced in an object should be replaced with an empty environment during serialization to an Oracle Database. When The following Oracle R Enterprise functions use this global option:
|
|
A logical value used during logical subscripting of an When |
|
A preferred degree of parallelism to use in embedded R execution. One of the following:
The default value is |
|
A character string that specifies the separator to use between multiple column row names of an |
|
A logical value that specifies whether iterative Oracle R Enterprise functions should print output at each iteration. The default value is |
|
A logical value that specifies whether Oracle R Enterprise displays a warning message when an |
See Also:
"Global Options Related to Ordering" for information on using ore.sep
and ore.warn.order
Oracle R Enterprise includes several example scripts that demonstrate the use of Oracle R Enterprise functions. This section contains the following topics:
See Also:
"Oracle R Enterprise Online Resources"You can display a list of the Oracle R Enterprise example scripts with the demo
function as shown in Example 1-7.
Listing for Example 1-7
R> demo(package = "ORE") Demos in package 'ORE': aggregate Aggregation analysis Basic analysis & data processing operations basic Basic connectivity to database binning Binning logic columnfns Column functions cor Correlation matrix crosstab Frequency cross tabulations datastore Datastore operations datetime Date/Time operations derived Handling of derived columns distributions Distribution, density, and quantile functions do_eval Embedded R processing esm Exponential smoothing method freqanalysis Frequency cross tabulations glm Generalized Linear Models graphics Demonstrates visual analysis group_apply Embedded R processing by group hypothesis Hyphothesis testing functions matrix Matrix related operations nulls Handling of NULL in SQL vs. NA in R odm_ai Oracle Data Mining: attribute importance odm_ar Oracle Data Mining: association rules odm_dt Oracle Data Mining: decision trees odm_glm Oracle Data Mining: generalized linear models odm_kmeans Oracle Data Mining: enhanced k-means clustering odm_nb Oracle Data Mining: naive Bayes classification odm_nmf Oracle Data Mining: non-negative matrix factorization odm_svm Oracle Data Mining: support vector machines push_pull RDBMS <-> R data transfer rank Attributed-based ranking of observations reg Ordinary least squares linear regression row_apply Embedded R processing by row chunks sampling Random row sampling and partitioning of an ore.frame sql_like Mapping of R to SQL commands stepwise Stepwise OLS linear regression summary Summary functionality table_apply Embedded R processing of entire table
You can run an Oracle R Enterprise example script with the demo
function. Most of the examples use the iris
data set that is in the datasets
package that is included in the R distribution.
To run an example script, start R, load the ORE
packages with library(ORE)
, connect to the database, and then use the demo
function.
Example 1-8 runs the basic.R
example script. In the listing that follows the example, only the first several lines of the output of the script are shown. The script creates an in-memory database object, IRIS_TABLE
, which is an ore.frame
object. The script then demonstrates that the iris
data.frame
and the IRIS_TABLE
ore.frame
have the same structure and contain the same data.
Listing for Example 1-8
R> demo("basic", package = "ORE") demo(basic) ---- ~~~~~ Type <Return> to start : R> # R> # O R A C L E R E N T E R P R I S E S A M P L E L I B R A R Y R> # R> # Name: basic.R R> # Description: Demonstrates basic connectivity to database R> # R> # R> # R> R> ## Set page width R> options(width = 80) R> # Push the built-in iris data frame to the database R> IRIS_TABLE <- ore.push(iris) R> # Display the class of IRIS_TABLE R> class(IRIS_TABLE) [1] "ore.frame" attr(,"package") [1] "OREbase" R> # Basic commands R> R> # Number of rows R> nrow(iris) [1] 150 R> nrow(IRIS_TABLE) [1] 150 R> # Column names of the data frame R> names(iris) [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" R> names(IRIS_TABLE) [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" # The rest of the output is not shown.
See Also:
Chapter 2, "Getting Started with Oracle R Enterprise" for more information on using basic Oracle R Enterprise functions