Skip Headers
Oracle® R Enterprise User's Guide
Release 11.2 for Linux, Solaris, AIX, and Windows

Part Number E26499-05
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

3 Using Oracle R Enterprise

This chapter explains how to use Oracle R Enterprise to analyze data stored in tables or views in an Oracle Database. Before you analyze data in tables, you must connect to a database, as described in Tables in the Oracle Database.

This chapter discusses these topics:

We assume familiarity with R in the remainder of this section.

These examples were all created using R Console, the default graphical user interface for Open Source R.

For information about ore.connect, ore.attach, ore.sync, and ore.ls, see Start the Oracle R Enterprise Client on Microsoft Windows, Start the Oracle R Enterprise Client on Linux. Solaris, or AIX, and Connect to an Oracle Database.

Oracle R Enterprise also includes the Oracle R Enterprise Statistical Functions, described in Chapter 4.

Tables in the Oracle Database

Before you can use Oracle R Enterprise to analyze data stored in database tables, you must Start the Oracle R Enterprise Client on Microsoft Windows or Start the Oracle R Enterprise Client on Linux. Solaris, or AIX.

Objects created by Oracle R Enterprise are identified with the ore prefix. Pick any object returned by ore.ls() and type either class(OBJECTNAME) or class(OBJECTNAME$COLUMN_NAME).

For example,

R> class(NARROW)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"

The prefix ore is applied to the class names. This indicates that the object is an Oracle R Enterprise created object that holds metadata (instead of contents) of the corresponding object in Oracle Database.

Two important metadata objects are

ore.frame or ore.matrix can be returned by the class() function. For an example of creating ore.frame data, see Load Data into the Database.

View Oracle R Enterprise Documentation

Use this command to view the Oracle R Enterprise documentation library:

R> OREShowDoc()

Oracle R Enterprise Data

When you install Oracle R Enterprise, two tables NARROW and ONTIME_S are loaded into the rquser schema:

R> ore.ls()
[1] "NARROW"   "ONTIME_S"

Oracle R Enterprise includes these functions:

Load Data into the Database

Follow these steps to load data from files on your system to the Oracle Database:

  1. Load contents of the file to an R data frame using read.table() or read.csv() functions documented in the R manuals.

  2. Then use ore.create()to load a data frame to a table:

    ore.create(data_frame, table="TABLE_NAME")
    

loads data_frame into the database table TABLE_NAME.

This example creates an R data frame df consisting of pairs of numbers and letters and then loads the data frame into the Oracle table DF_TABLE. The example shows that the data frame and the table have the same dimensions and the same first few elements, but different values for class. The class for DF_TABLE is ore.frame.

R> df <- data.frame(A=1:26, B=letters[1:26])
R> dim(df)
[1] 26  2
R> class(df)
[1] "data.frame"
R> head(df)
  A B
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
R> ore.create(df, table="DF_TABLE")
R> ore.ls()
[1] "DF_TABLE" "NARROW"   "ONTIME_S"
R> class(DF_TABLE)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
R> dim(DF_TABLE)
[1] 26  2
R> head(DF_TABLE)
  A B
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
R> 

Materialize R Data

ore.push(data-frame) stores an R object in the database as a temporary object, and returns a handle to that object. It converts data frame, matrix, and vector to a table, and list, model, and others to a serialized object.

This example pushes the numerical vector created by the R command c(1,2,3,4,5) to v, an Oracle R Enterprise object:

v <- ore.push(c(1,2,3,4,5))
R> class(v)
[1] "ore.numeric"
attr(,"package")
[1] "OREbase"
R> head(v)
[1] 1 2 3 4 5

Drop a Database Table

To drop a table in the database use

ore.drop(table="NAMEOFTABLE")

For example, this command drops DF_TABLE:

ore.drop(table="DF_TABLE")

Pull a Database Table to an R Frame

To pull the contents of an Oracle Database table or view into an in-memory R data frame use ore.pull(OBJECT_NAME)for the name of an object returned by ore.ls().

Note:

You can pull a table or view to an R frame only if the.if the size of the data can fit into R's memory.

For example, use ore.pull() to create the data frame df_narrow from the table NARROW and then verify that df_narrow is a data frame:

R> df_narrow <- ore.pull(NARROW)
R> class(df_narrow)
[1] "data.frame"

Oracle R Enterprise Transparency Framework

The Oracle R Enterprise transparency framework allows R users to continue to use R syntax to work directly with database-resident objects without having to pull data from Oracle into R's memory on the user's desktop.

R language constructs and syntax are supported for objects mapped to Oracle Database objects. The following R data types have been overloaded so that they are mapped to database objects and hence enabled for in-database execution:

class(object) reports the data type of such mapped objects. For example,

R> class(NARROW$AGE)
[1] "ore.numeric"
attr(,"package")
[1] "OREbase"

The following operators and functions are supported. See R documentation for syntax and semantics of these operators and functions. Syntax and semantics for these items remain unchanged when used on a corresponding database-mapped data type (also known as an Oracle R Enterprise data type):

The following additional categories of functions exist to accomplish conversions to/from and checks on Oracle R Enterprise data types:

The Oracle R Enterprise sample programs, described in Oracle R Enterprise Examples include several examples using each category of the functions listed above with Oracle R Enterprise data types.

Our design principle has been to support data pre-processing functionality extensively so all data preparation and analysis can take place directly in the database.If you need to use a statistical technique that is not available in Oracle R Enterprise, having used Oracle R Enterprise to preprocess and filter the data, a much smaller amount of data can be pulled into R.

If a specific function that you need is not in the list above, you must explicitly pull data from the database into the R engine memory using ore.pull() to create an in-memory R object first.

Using R with Oracle R Enterprise Data Types

The following examples illustrate using R with Oracle R Enterprise data types:

  • Simple column and row selection in R:

    # Push built-in R data set iris to database
    ore.create(iris, table="IRIS")
    head(iris)
    iris_projected = IRIS[, c("PETAL_LENGTH", "SPECIES")]
    R> head (iris_projected)
      PETAL_LENGTH SPECIES
    0          1.4  setosa
    1          1.4  setosa
    2          1.3  setosa
    3          1.5  setosa
    4          1.4  setosa
    5          1.7  setosa
    
  • Database JOIN using R:

    df1 <- data.frame(x1=1:5, y1=letters[1:5])
    df2 <- data.frame(x2=5:1, y2=letters[11:15])
    merge (df1, df2, by.x="x1", by.y="x2")
     x1 y1 y2
    1  1  a  o
    2  2  b  n
    3  3  c  m
    4  4  d  l
    5  5  e  k
    # Create database objects to correspond to in-memory R objects df1 and df2
    ore.df1 <- ore.create(df1, table="DF1")
    ore.df2 <- ore.create(df2, table="DF2")
    # Compare results
    R> merge (DF1, DF2, by.x="X1", by.y="X2")
      X1 Y1 Y2
    0  1  a  o
    1  2  b  n
    2  3  c  m
    3  4  d  l
    4  5  e  k
    
  • Database aggregation using R:

    # Push built-in data set iris to database
    ore.create(iris, table="IRIS")
    aggdata <- aggregate(IRIS, by = list(IRIS$SPECIES), FUN = summary)
    class(aggdata)
    head(aggdata)
    
  • Data formatting and creating derived columns in R

    Note that adding derived columns does not change the database table. See Derived Columns in Oracle R Enterprise.

    diverted_fmt <- function (x) {
    ifelse(x==0, 'Not Diverted',
    ifelse(x==1, 'Diverted',''))
    }
    cancellationCode_fmt <- function(x) {
    ifelse(x=='A', 'A CODE',
    ifelse(x=='B', 'B CODE',
    ifelse(x=='C', 'C CODE',
    ifelse(x=='D', 'D CODE', 'NOT CANCELLED'))))
    }
    delayCategory_fmt <- function(x) {
    ifelse(x>200,'LARGE',
    ifelse(x>=30,'MEDIUM','SMALL'))
    }
    zscore <- function(x) {
    (x-mean(x,na.rm=TRUE))/sd(x,na.rm=TRUE)
    # ONTIME_S is a database table
    ONTIME_S$DIVERTED <- diverted_fmt(DIVERTED)
    ONTIME_S$CANCELLATIONCODE <- cancellationCode_fmt(CANCELLATIONCODE)
    ONTIME_S$ARRDELAY <- delayCategory_fmt(ARRDELAY)
    ONTIME_S$DEPDELAY <- delayCategory_fmt(DEPDELAY)
    ONTIME_S$DISTANCE_ZSCORE <- zscore(DISTANCE)
    

Derived Columns in Oracle R Enterprise

When you add derived columns using Oracle R Enterprise, the derived do not affect the underlying table in the database. All that is generated is a SQL query that has the additional derived columns in the select list.

Oracle R Enterprise Database-Embedded R Engine

The embedded R engine in the Oracle Database allows R users to off load desktop calculations that may require either more resources such as those available to Oracle Database or database-driven data parallelism. The embedded R engine also executes R scripts embedded in SQL or PL/SQL programs (lights-out processing).

These examples illustrate using Oracle R Enterprise embedded R engine with standard R packages downloaded from CRAN:

Build a Regression Model

This example illustrates building a regression model using a CRAN package. Prepare the data used for training in the database (filtering out observations that are not of interest, selecting attributes, imputing missing values, etc.) to create the table ONTIME_S_PREPROCESSED_SUBSET. Pull the prepared training set (which is usually small enough to fit in desktop R memory) into R client to execute the model build. The resulting model is then used to score vast numbers of rows, in parallel, in the Oracle Database.

Note that scoring is a trivially parallelizable operation because one row can be scored independent of and in parallel with another row. The model built on the desktop is shipped to the database to perform scoring on vast number of rows in the database.

The computations are divided into these steps:

  1. Build a model in the desktop:

    dat <- ore.pull(ONTIME_S_PREPROCESSED_SUBSET)
    mod <- glm(ARRDELAY ~ DISTANCE + DEPDELAY, dat)
    mod
    summary(mod)
    
  2. Score in-parallel in the database using embedded R:

    prd <- predict(mod, newdata=ONTIME_S_FINAL_DATA_TO_BE_SCORED)
    class(prd)
    # Add predictions as a new column
    res <- cbind(newdat, PRED = prd)
    head(res)
    

Perform R Computation in the Oracle Database

This example illustrates off-loading R computation to execute in the embedded R engine. The R user simply includes his code within a closure (that is, function() {}) and invokes ore.doEval(). ore.doEval() schedules execution of the R code with the database-embedded R engine and returns the results back to the desktop for continued analysis:

mod <- ore.doEval(
   function() {
      library(biglm)
      dat <- ore.pull(ONTIME_S)
      mod <- biglm(ARRDELAY ~ DISTANCE + DEPDELAY, dat)
      mod
    });
print(mod)
mod=ore.pull(mod)
print(mod)

Build a Series of Regression Models Using Data Parallelism

This example illustrates database-driven data parallelism at work in building a series of regression models using a CRAN package. One model is built per unique value of a factor. The database orchestrates parallel and concurrent building of the models, one per factor and bringing the list of all models built to the user desktop for further analysis:

modList <- ore.groupApply(
  # Organize input to the R script – This is always an Oracle R Enterprise   
  # data frame
   X=ONTIME_S,
  # Specify the grouping column. Here we request one model per unique value of
  # ONTIME_S$DEST
   INDEX=ONTIME_S$DEST,
  # Model building code goes inside the closure. Input and grouping 
  # conditions can be referenced as parameters to the function
    function(x, param) {
     library(biglm)
     biglm(ARRDELAY ~ DISTANCE + DEPDELAY, x)
     });

   modList_local <- ore.pull(modList)
# Print the model for just one destination - BOSTON
summary(modList_local$BOS)

Oracle R Enterprise Additional R Functions

These functions are available to enable richer statistical analysis. See the Oracle R Enterprise Sample Library described in Oracle R Enterprise Examples for usage examples of each function. The functions all operate on an Oracle R Enterprise data frame:

Oracle R Enterprise SQL Functions

Oracle R Enterprise users who are allowed to execute R code via SQL queries must be granted the RQROLE role.

To enable execution of an R script in the database (lights-out processing), Oracle R Enterprise provides variants of ore.doEval(), ore.groupApply(), and ore.indexApply() in SQL. (ore.doEval(), ore.groupApply(), and ore.indexApply() are described in Oracle R Enterprise Database-Embedded R Engine.)

The SQL functions are

You can also code an rqGroupEval() Function function.

The rq*:Eval()functions have the same syntax:

rq*Eval(
     cursor(select * from table-1,
     cursor(select * from table-2,
     'select <column list> from table-3 t',
     <grouping col-name from table-1 or num_rows>,
      'function(x,param) {
        registered-R-code
     ] ')

where

The following examples illustrate using these functions:

rqGroupEval() Function

There is no rqGroupEval() function as such. You must define a private version of rqGroupEval() based on the data and grouping column. This is the limitation of the table function infrastructure.

Here is an example based on the ONTIME_S sample data. The data cursor uses all data, but you could also define cursors that use some columns using PL/SQL records. Then you must define as many table functions as the number of grouping columns that you are interested in using for a particular data cursor:

CREATE PACKAGE ontimePkg AS
  TYPE cur IS REF CURSOR RETURN ontime_s%ROWTYPE;
END ontimePkg;
/
 
CREATE FUNCTION ontimeGroupEval(
  inp_cur  ontimePkg.cur,
  par_cur  SYS_REFCURSOR,
  out_qry  VARCHAR2,
  grp_col  VARCHAR2,
  exp_txt  CLOB)
RETURN SYS.AnyDataSet
PIPELINED PARALLEL_ENABLE (PARTITION inp_cur BY HASH (month))
CLUSTER inp_cur BY (month)
USING rqGroupEvalImpl;
/

At this time, only one grouping column is supported. If you have multiple columns you combine the columns into one column and use the new column as a grouping column. PARALLEL_ENABLE clause is optional but CLUSTER BY is not.

Registering R Scripts

For security purposes, you must first register the R script under some system unique name and use new name instead of the actual script in the call to rq*Eval table functions.

There are two administrative functions that create and drop scripts and a view that lists scripts:

  • sys.rqScriptCreate

  • sys.rqScriptDrop

  • sys.rq_scripts view allows you to list and use scripts that were created

The scripts and the view require grants as described in Roles Required to Create and Use Scripts.

Here is an example of registering the scripts and using the registered scripts:

begin
  sys.rqScriptCreate('tmrqfun2',
'function() {
ID <- 1:10
res <- data.frame(ID = ID, RES = ID / 100)
res
}');
end;
/
 
select *
  from table(rqEval(
        NULL,
       'select 1 id, 1 res from dual',
       'tmrqfun2'));
 
begin
  sys.rqScriptDrop('tmrqfun2');
end;

Roles Required to Create and Use Scripts

To execute sys.rqScriptCreate and sys.rqScriptDrop, you must be granted the administrative role RQADMIN.

Select privilege for the sys.rq_scripts view is granted to RQROLE role.

The RQADMIN and RQROLE role are created when you Install Server on Linux

Oracle R Enterprise Examples

Oracle R Enterprise is shipped with a collection of examples that illustrate how to use Oracle R Enterprise. These examples are a collection of self-contained R scripts.

Most of the sample programs use the data frame iris, which is included in the R distribution. iris is loaded into a table as described in Load Data Frame to a Table.

The rest of this section describes two examples in detail and includes a list of all examples:

Load Data Frame to a Table

Follow these steps to load an R data frame to a database table:

  1. Starts R, load the ORE packages via library(ORE), and then connect to the database. The latter steps are automatic if Rprofile is in place.

  2. Most of these examples use the R data frame iris. iris is shipped with R. Use the R command class to verify that iris is an R data frame:

    R> class(iris)
    [1] "data.frame"
    

    iris consist of measurements of parts of iris flowers. Use the R command head to see a small sample of the data in iris.

    R> head(iris)
          Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1          5.1         3.5          1.4         0.2  setosa
    2          4.9         3.0          1.4         0.2  setosa
    3          4.7         3.2          1.3         0.2  setosa
    4          4.6         3.1          1.5         0.2  setosa
    5          5.0         3.6          1.4         0.2  setosa
    6          5.4         3.9          1.7         0.4  setosa
    
  3. Now load the data frame iris into the database that you are connected to.

    In these examples, the database table version of iris is named IRIS_TABLE. Drop IRIS_TABLE to make sure that no table of this name exists in the connected schema:

    ore.drop(table = "IRIS_TABLE")
    

    If IRIS_TABLE doesn't exist, you do not get a message.

  4. Now create a database table with the data contained in iris:

    ore.create(iris, table = "IRIS_TABLE")
    

    Use ore.ls() to verify that the table was created:

    R> ore.ls()
    [1] "IRIS_TABLE" "NARROW"     "ONTIME_S" 
    
  5. IRIS_TABLE is a database-resident table with just metadata on the R side:

    R> class(IRIS_TABLE)
    [1] "ore.frame"
    attr(,"package")
    [1] "OREbase"
    
  6. Use head to see the column names and the first few values in IRIS_TABLE:

    R> head(IRIS_TABLE)
      SEPAL_LENGTH SEPAL_WIDTH PETAL_LENGTH PETAL_WIDTH SPECIES
    0          5.1         3.5          1.4         0.2  setosa
    1          4.9         3.0          1.4         0.2  setosa
    2          4.7         3.2          1.3         0.2  setosa
    3          4.6         3.1          1.5         0.2  setosa
    4          5.0         3.6          1.4         0.2  setosa
    5          5.4         3.9          1.7         0.4  setosa
    
  7. Use mode to see the data type of the column SPECIES.

    mode(IRIS_TABLE$SPECIES)
    [1] "raw"
    
  8. Some algorithms only work if all of the data is numerical. Follow these steps to create IRIS_TABLE_N that does not contain SPECIES, the nonnumeric column:

    IRIS_TABLE_N=IRIS_TABLE[,c("SEPAL_LENGTH", "SEPAL_WIDTH",  "PETAL_LENGTH", "PETAL_WIDTH")]
    

You can use R functions to analyze the data in the table. Here are some simple examples taken from the example basic.R:

  • Use unique to get a list of the unique entries in a column. This example find the unique SPECIES:

    R> unique(IRIS_TABLE$SPECIES)
    [1] setosa     versicolor virginica 
    Levels: setosa versicolor virginica
    
  • Find the minimum, maximum, and mean of PETAL_LENGTH:

    R> min(IRIS_TABLE$PETAL_LENGTH)
    [1] 1
    R> max(IRIS_TABLE$PETAL_LENGTH)
    [1] 6.9
    R> mean(IRIS_TABLE$PETAL_LENGTH)
    [1] 3.758
    

If you need information about an R function, use the command help(function-name).

Handle NULL Values Using airquality

The sample null.R is the only sample that does not use iris as data. null.R compares the handling of NULLs in SQL with the handling of NAs in R.

In R, NA is a logical constant of length 1 which contains a missing value indicator. In the database, null refers to the absence of a value in a column of a row. Nulls indicate missing, unknown, or inapplicable data.

Follow these steps to execute the sample:

  1. This example uses the data frame airquality. Verify that the data set is a data frame and look at the few rows of the data frame:

    R> class(airquality)
    [1] "data.frame"
    R> head(airquality)
      Ozone Solar.R Wind Temp Month Day
    1    41     190  7.4   67     5   1
    2    36     118  8.0   72     5   2
    3    12     149 12.6   74     5   3
    4    18     313 11.5   62     5   4
    5    NA      NA 14.3   56     5   5
    6    28      NA 14.9   66     5   6
    
  2. Load airquality into the database as "AIRQUALITY":

    ore.drop(table = "AIRQUALITY")
    ore.create(airquality, table = "AIRQUALITY")
    

    Use ore.ls() to verify that the table was created. If you wish, use class(AIRQUALITY) to verify that AIRQUALITY is a database-resident table with just metadata on the R side.

  3. Examine how R handles NAs. Return all observations where ozone < 30:

    R> nrow(airquality[airquality$Ozone < 30,])[1] 92
    

    Compare this with the results when NAs are explicitly excluded:

    R> nrow(airquality[airquality$Ozone < 30 & !is.na(airquality$Ozone),])
    [1] 55
    
  4. The default behavior for SQL tables is to exclude NULLS in output:

    nrow(AIRQUALITY[AIRQUALITY$OZONE < 30,])
    [1] 55
    

    To handle NULLs the same way that R handles NA, request the behavior explicitly:

    options(ore.na.extract = TRUE)
    nrow(AIRQUALITY[AIRQUALITY$OZONE < 30,])
    [1] 92
    

List of Examples

These scripts have been added as demos to the ORE package.

To access a complete listing of them type

R> demo(package = "ORE")

To run one of these scripts, specify the name of the demo in a demo function call. For example, to run aggregate.R, type

R> demo("aggregate", package = "ORE")

These examples are shipped with Oracle R Enterprise:

table_apply.R   Execute R code on all rows of a table passed in at once
aggregate.R     Demonstrates aggregations. See also summary.R
analysis.R      Demonstrates basic analysis and data processing operations
basic.R         Demonstrates basic connectivity to database
binning.R       Demonstrates binning in R
columnfns.R     Demonstrates use of column functions
corr.R          Correlation matrix (Pearson's, Spearman/Kendalls)
crosstab.R      Frequency cross-tabulations. Also see freq.R
derived.R       Handling derived columns
distributions.R Distribution, Density, and Quantile Functions
doEval.R        Demonstrates support for database-enabled parallel simulations
freqanalysis.R  Frequency cross-tabulations. Also see crosstab.R
graphics.R      Demonstrates visual analysis (boxplot, histogram)
group_apply.R   Execute R code for different sets of rows, one set per group
hypothesis.R    Hypothesis Testing Functions(binomial, chi square, T test, etc.)
matrix.R        Matrix operations
nulls.R         Demonstrates handling of nulls in SQL vs. NAs in R
push_pull.R     Demonstrates collaborative processing between database and client
rank.R          Ranking of observations (ranking, handling ties, etc.)
reg.R           Multivariate Regression
row_apply.R     Execute R code on each row
sql_like.R      Demonstrates how R commands map to SQL operations
stepwise.R      Stepwise Multivariate Regression
summary.R       Demonstrates summary functionality