1.4 About Transparently Using R on Oracle Database Data

Oracle R Enterprise has overloaded open source R methods and functions that you can use to operate directly on data in an Oracle Database instance. The methods and functions are in packages that implement a transparency layer that translates R functions into SQL.

The Oracle R Enterprise transparency layer packages and the limitations of converting R into SQL are described in the following topics:

1.4.1 About the Transparency Layer

The Oracle R Enterprise transparency layer is implemented by the OREbase, OREgraphics, and OREstats packages. These Oracle R Enterprise packages contain overloaded methods of functions in the open source R base, graphics, and stats packages, respectively. The Oracle R Enterprise packages also contain Oracle R Enterprise versions of some of the open source R functions.

With the methods and functions in these packages, you can create R objects that specify data in an Oracle Database instance. When you execute an R expression that uses such an object, the method or function transparently generates a SQL query and sends it to the database. The database then executes the query and returns the results of the operation as an R object.

A database table or view is represented by an ore.frame object, which is a subclass of data.frame. Other Oracle R Enterprise classes inherit from corresponding R classes, such as ore.vector and vector. Oracle R Enterprise maps Oracle Database data types to Oracle R Enterprise classes, such as NUMBER to ore.integer.

You can use the transparency layer methods and functions to prepare database-resident data for analysis. You can then use functions in other Oracle R Enterprise packages to build and fit models and use them to score data. For large data sets, you can do the modeling and scoring using R engines embedded in Oracle Database.

See Also:

Example 1-2 Finding the Mean of the Petal Lengths by Species in R

This example illustrates the translation of an R function invocation into SQL. It uses the overloaded Oracle R Enterprise aggregate function to get the mean of the petal lengths from the IRIS_TABLE object.

ore.create(iris, table = 'IRIS_TABLE')
aggplen = aggregate(IRIS_TABLE$Petal.Length,
                    by = list(species = IRIS_TABLE$Species),
                    FUN = mean)
aggplen
Listing for This Example
R> ore.create(iris, table = 'IRIS_TABLE')
R> aggplen = aggregate(IRIS_TABLE$Petal.Length,
                       by = list(species = IRIS_TABLE$Species),
                       FUN = mean)
R> aggplen
              species     x
setosa         setosa 1.462
versicolor versicolor 4.260
virginica   virginica 5.552

Example 1-3 SQL Equivalent of the Previous Example

This example shows the SQL equivalent of the aggregate function in the previous example.

SELECT "Species", AVG("Petal.Length")
FROM IRIS_TABLE
GROUP BY "Species"
ORDER BY "Species";
 
Species     AVG("PETAL.LENGTH") 
----------- ------------------- 
setosa       1.4620000000000002 
versicolor   4.26
virginica    5.552

1.4.2 Transparency Layer Support for R Data Types and Classes

Oracle R Enterprise transparency layer has classes and data types that map R data types to Oracle Database data types. Those classes and data types are described in the following topics:

1.4.2.1 About Oracle R Enterprise Data Types and Classes

Oracle R Enterprise has data types that map R data types to SQL data types. In an R session, when you create database objects from R objects or you create R objects from database data, Oracle R Enterprise translates R data types to SQL data types and the reverse where possible.

Oracle R Enterprise creates objects that are instances of Oracle R Enterprise classes. Oracle R Enterprise overloads many standard R functions so that they use Oracle R Enterprise classes and data types. R language constructs and syntax are supported for objects that are mapped to Oracle Database objects.

Table 1-1 Mappings Between R, Oracle R Enterprise, and SQL Data Types

R Data Type Oracle R Enterprise Data Type SQL Data Type

character mode vector

ore.character

VARCHAR2

INTERVAL YEAR TO MONTH

integer mode vector

ore.integer

NUMBER

logical mode vector

ore.logical

The NUMBER 0 for FALSE and 1 for TRUE

numeric mode vector

ore.number

BINARY_DOUBLE

BINARY_FLOAT

FLOAT

NUMBER

Date

ore.date

DATE

POSIXct

POSIXlt

ore.datetime

TIMESTAMP

TIMESTAMP WITH TIME ZONE

TIMESTAMP WITH LOCAL TIME ZONE

difftime

ore.difftime

INTERVAL DAY TO SECOND

None

Not supported

LONG

LONG RAW

RAW

User defined data types

Reference data types

Note:

  • Objects of type ore.datetime do not support a time zone setting, instead they use the system time zone Sys.timezone if it is available or GMT if Sys.timezone is not available.

  • The SQL VARCHAR2 data type is mapped to the R character data type through the embedded R input data argument. Users can convert the character variable to a factor in R if needed by using as.factor().

1.4.2.2 About the ore.frame Class

An ore.frame object represents a relational query for an Oracle Database instance. It is the Oracle R Enterprise equivalent of a data.frame. Typically, you get ore.frame objects that are proxies for database tables. You can then add new columns, or make other changes, to the ore.frame proxy object. Any such change does not affect the underlying table. If you then request data from the source table of the ore.frame object, the transparency layer function generates a SQL query that has the additional columns in the select list, but the table is not changed.

In R, the elements of a data.frame have an explicit order. You can specify elements by using integer indexing. In contrast, relational database tables do not define any order of rows and therefore cannot be directly mapped to R data structures.

Oracle R Enterprise has both ordered and unordered ore.frame objects. If a table has a primary key, which is a set of one or more columns that form a distinct tuple within a row, you can produce ordered results by performing a sort using an ORDER BY clause in a SELECT statement. However, ordering relational data can be expensive and is often unnecessary for transparency layer operations. For example, ordering is not required to compute summary statistics when invoking the summary function on an ore.frame.

Example 1-4 Classes of a data.frame and a Corresponding ore.frame

This example creates a data.frame with columns that contain different data types and displays the structure of the data.frame. The example then invokes the ore.push function to create a temporary table in the database that contains a copy of the data of the data.frame. The ore.push invocation also generates an ore.frame object that is a proxy for the table. The example displays the classes of the ore.frame object and of the columns in the data.frame and the ore.frame objects.

df <- data.frame(a="abc",
                 b=1.456,
                 c=TRUE,
                 d=as.integer(1),
                 e=Sys.Date(),
                 f=as.difftime(c("0:3:20", "11:23:15")))
ore.push(df)
class(of)
class(df$a
class(of$a)
class(df$b)
class(of$b)
class(df$c)
class(of$c)
class(df$d)
class(of$d)
class(df$e)
class(of$e)
class(df$f)
class(of$f)
Listing for Example 1-4
R> df <- data.frame(a="abc",
+                   b=1.456,
+                   c=TRUE,
+                   d=as.integer(1),
+                   e=Sys.Date(),
+                   f=as.difftime(c("0:3:20", "11:23:15")))
R> ore.push(df)
R> class(of)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
R> class(df$a)
[1] "factor"
R> class(of$a)
[1] "ore.factor"
attr(,"package")
[1] "OREbase"
R> class(df$b)
[1] "numeric"
R> class(of$b)
[1] "ore.numeric"
attr(,"package")
[1] "OREbase"
R> class(df$c)
[1] "logical"
R> class(of$c)
[1] "ore.logical"
attr(,"package")
[1] "OREbase"
R> class(df$d)
[1] "integer"
R> class(of$d)
[1] "ore.integer"
attr(,"package")
[1] "OREbase"
R> class(df$e)
[1] "Date"
R> class(of$e)
[1] "ore.date"
attr(,"package")
[1] "OREbase"
R> class(df$f)
[1] "difftime"
R> class(of$f)
[1] "ore.difftime"
attr(,"package")
[1] "OREbase"

1.4.2.3 Support for R Naming Conventions

Oracle R Enterprise uses R naming conventions for ore.frame columns instead of the more restrictive Oracle Database naming conventions. The column names of an ore.frame can be longer than 30 bytes, can contain double quotes, and can be non-unique.

1.4.2.4 About Coercing R and Oracle R Enterprise Class Types

The generic as.ore function coerces in-memory R objects to ore objects. The more specific functions, such as as.ore.character, coerce objects to specific types. The ore.push function implicitly coerces R class types to ore class types and the ore.pull function coerces ore class types to R class types. For information on those functions, see "Moving Data to and from the Database".

Example 1-5 Coercing R and Oracle R Enterprise Class Types

This example illustrates coercing R objects to ore objects. creates an R integer object and then uses the generic method as.ore to coerce it to an ore object, which is an ore.integer. The example coerces the R object to various other ore class types. For an example of using as.factor in embedded R execution function, see Example 6-13.

x <- 1:10
class(x)
X <- as.ore(x)
class(X)
Xn <- as.ore.numeric(x)
class(Xn)
Xc <- as.ore.character(x)
class(Xc)
Xc
Xf <- as.ore.factor(x)
Xf
Listing for Example 1-5
R> x <- 1:10
R> class(x)
[1] "integer"
R> X <- as.ore(x)
R> class(X)
[1] "ore.integer"
attr(,"package")
[1] "OREbase"
R> Xn <- as.ore.numeric(x)
R> class(Xn)
[1] "ore.numeric"
attr(,"package")
[1] "OREbase"
R> Xc <- as.ore.character(x)
R> class(Xc)
[1] "ore.character"
attr(,"package")
[1] "OREbase"
R> Xc
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
R> Xf <- as.ore.factor(x)
R> Xf
 [1] 1  2  3  4  5  6  7  8  9  10
Levels: 1 10 2 3 4 5 6 7 8 9