Skip Headers
Oracle® R Enterprise User's Guide
Release 1.3 for Windows, Linux, Solaris, and AIX

E36761-08
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

3 Using Oracle R Enterprise

This chapter explains how to use Oracle R Enterprise to analyze data stored in tables or views in an Oracle Database.

This chapter discusses these topics:

We assume familiarity with R in the remainder of this section.

For additional examples of using Oracle R Enterprise functionality, see Oracle R Enterprise Statistical Functions. For examples of building statistical models, including models created using Oracle Data Mining algorithm, see In-Database Predictive Models in Oracle R Enterprise.

Tables in Oracle Database

Before you can use Oracle R Enterprise to analyze data stored in database tables, you must install Oracle R Enterprise, start a client, and connect to the database, as described in Oracle R Enterprise Administrator's Guide.

By convention, most of the functions and methods defined in Oracle R Enterprise begin with the prefix ore. This is done to avoid name collisions with other R software. However, the objects created by those functions and methods can be anything the end user wants them to be. The end user has complete control over object naming.

Pick any object returned by ore.ls() and type either class(OBJECTNAME) or class(OBJECTNAME$COLUMN_NAME). For example, the following code shows that the class of DF_TABLE is ore.frame. The DF_TABLE object is created in Example: Load Data.

R> class(DF_TABLE)
[1] "ore.frame"

The prefix ore indicates that the object is an Oracle R Enterprise created object that holds metadata for the corresponding object in Oracle Database.

ore.frame is the Oracle R Enterprise metadata object that maps to a database table. The ore.frame object is the counterpart to an R data.frame.

ore.frame or can be returned by the class() function. For an example of creating ore.frame data, see Load an R Data Frame into the Database.

View Oracle R Enterprise Documentation

Use this command to view the Oracle R Enterprise documentation library:

R> OREShowDoc()

Oracle R Enterprise Data

Oracle R Enterprise supports this functionality:

Long Names

Oracle R Enterprise handles R naming conventions for ore.frame columns, instead of a more restrictive Database names. ore.frame column names can be longer than 30 bytes, contain double quotes, and be non-unique.

Load an R Data Frame into the Database

Follow these steps to load data from R data frames on your system to the Oracle database:

  1. Load contents of the file to an R data frame using read.table() or read.csv() functions documented in R online help.

  2. Then use ore.create()to load a data frame to a table:

    ore.create(data_frame, table="TABLE_NAME")
    

Step 2 loads data_frame into the database table TABLE_NAME.

For an example, see Example: Load Data.

Example: Load Data

This example creates an R data frame df consisting of pairs of numbers and letters and then loads the data frame into the table DF_TABLE. The example shows that the data frame and the table have the same dimensions and the same first few elements, but different values for class. The class for DF_TABLE is ore.frame. At the end of the example is a check that DF_TABLE exists in the current schema.

R> df <- data.frame(A=1:26, B=letters[1:26])
R> dim(df)
[1] 26  2
R> class(df)
[1] "data.frame"
R> head(df)
  A B
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
R> ore.create(df, table="DF_TABLE")
R> ore.ls()
[1] "DF_TABLE" 
R> class(DF_TABLE)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
R> dim(DF_TABLE)
[1] 26  2
R> head(DF_TABLE)
  A B
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
R> exists("DF_TABLE")[1] TRUE

If you connect to the database using a tool such as SQL Developer, you can view DF_TABLE directly in the database.

Materialize R Data

The ore.push(data.frame) function stores an R object in the database as a temporary object, and returns a handle to that object. It converts data frame, matrix, and vector objects to a table, and list, model, and other objectss to a serialized object.

The object that you create exists during the R session; to store the data in a permanent way, see Persist and Manage R Objects in the Database

This example pushes the numerical vector created by the R command c(1,2,3,4,5) to v, an Oracle R Enterprise object:

R> v <- ore.push(c(1,2,3,4,5))
R> class(v)
[1] "ore.numeric"
attr(,"package")
[1] "OREbase"
R> head(v)
[1] 1 2 3 4 5

Verify that an ore.frame Exists

ore.exists() checks for the existence of an ore.frame object in the ORE schema environment. For ore.exists()to find an ore.frame object the object must have been synchronized with ore.sync() first.

The objects available in the ORE environment are not necessarily the same as the database objects. One should not use ore.exists() to check for table existence.

For an example, see Example: Load Data.

ore.exists(name, schema)has these arguments:

  • name: A character string specifying the name of the ore.frame object

  • schema: A character string specifying the name of database schema to check

ore.exists() returns TRUE if the object exists in the ORE schema and FALSE, if it does not exist.

Drop a Database Table

To drop a table in the database use

ore.drop(table="NAMEOFTABLE")

For example, these commands drop the table v and verifies that it does not exist:

R> ore.drop(table="v")
R> ore.exists("v")
[1] FALSE

If you drop a table that does not exist, there is no error message.

Pull a Database Table to an R Frame

To pull the contents of an Oracle Database table or view to an in-memory R data frame use ore.pull(OBJECT_NAME)for the name of an object returned by ore.ls().

Note:

You can pull a table or view to an R frame only if the data can fit into R's memory.

Suppose that your Oracle Database contains the table NARROW. Then ore.pull() creates the data frame df_narrow from the table NARROW. When you verify that df_narrow is a data frame. The warning message appears because the table NARROW is not indexed:

R> df_narrow <- ore.pull(NARROW)
Warning message:
ORE object has no unique key - using random order 
R> class(df_narrow)
[1] "data.frame"

Order in Tables

Almost all data in R is a vector or is based on vectors (vectors themselves, lists, matrices, data frames, and so forth). The elements of a vector have an explicit order. Each element has an index. R code actively uses this order of elements.

However, database-backed relational data (tables and views) does not define any order of rows and thus cannot be directly mapped to R data structures. You can define an explicit order on database tables and views via an ORDER BY clause. The order is usually achieved by having a unique identifier (single- or multi- column key). Ordering in this way can be inefficient and slow for some operations that lead to unnecessary sorting.

row.names<- defines ordering but doesn't actually index a table. The assignment option provides a way to specify a unique column. Initially it supports at least one column but may support multi-column specifications as well. When row.names<- is applied to unordered frames, it returns an error.

You can use the integer indexing created by the ordering infrastructure to perform sampling and partitioning, as described in Sampling and Partitioning.

Suppose that the table NARROW is not indexed. The following example illustrates using row.names to create an indexed table:

R> row.names(head(NARROW))
Error: ORE object has no unique key
In addition: Warning message:
ORE object has no unique key - using random order 
R> 
R> row.names(NARROW) <- NARROW$ID
R> 
R> row.names(head(NARROW[,1:3]))
[1] "101501" "101502" "101503" "101504" "101505" "101506"
R> 
R> head(NARROW[,1:3])
           ID GENDER AGE
101501 101501   <NA>  41
101502 101502   <NA>  27
101503 101503   <NA>  20
101504 101504   <NA>  45
101505 101505   <NA>  34
101506 101506   <NA>  38

Sampling and Partitioning

The ordering (indexing) for tables described in Order in Tables can be used to perform sampling and partitioning.

This section provides examples of

Indexing

R supports powerful constructions using vectors as indices. Oracle R Enterprise supports similar functionality with these differences:

  • Integer indexing is not supported for ore.vector objects.

  • Negative integer indexes are not supported.

  • Row order is not preserved.

This example illustrates indexing:

R> tmp <- ASTHMA
R> tmp[c(1L, 2L, 1L),]
Error: ORE object has no unique key
R> rownames(tmp) <- tmp
R> tmp[c(1L, 2L, 1L),]
         CITY ASTHMA COUNT
1|0|65      1      0    65
1|0|65.1    1      0    65
1|1|35      1      1    35
R> tmp[c(1L, 2L, 1L),]@dataQry

Sampling

This code illustrates several sampling techniques:

# Generate random data
set.seed(123)
N <- 1000000
mydata <- data.frame(x = rnorm(N, mean = 20, sd = 2),
                     group =
                     sample(letters, N, replace = TRUE,
                            prob = (26:1)/sum(26:1)))
mydata$y <-
    rbinom(N, 1,
           1/(1+exp(-(.5 - 0.25 * mydata$x + .1 * as.integer(mydata$group)))))
MYDATA <- ore.push(mydata)
rm(mydata)
 
# Create a function that creates random row indices from large tables
mysampler <- function(n, size, replace = FALSE)
{
    #' Random Whole Number Sampler
    #' @param n       number of observations in sample
    #' @param size    total number of observations
    #' @param replace indicator for sampling with replacement
    #' @return numeric vector containing the sample indices
    n    <- round(n)
    size <- round(size)
    if    (n < 0) stop("'n' must be a non-negative number")
    if (size < 1) stop("'size' must be a positive number")
    if (!replace && (n > size))
        stop("'n' cannot exceed 'size' when 'replace = FALSE'")
    if (n == 0)
        numeric()
    else if (replace)
        round(runif(n, min = 0.5, max = size + 0.5))
    else
    {
        maxsamp <- seq(size + 0.5, by = -1, length.out = n)
        samp <- round(runif(n, min = 0.5, max = maxsamp))
        while(length(bump1 <- which(duplicated(samp))))
            samp[bump1] <- samp[bump1] + 1
        samp
    }
}
 
# Data set and sample size
N <- nrow(MYDATA)
sampleSize <- 500
 
# 1. Simple random sampling
srs <- mysampler(sampleSize, N)
simpleRandomSample <- ore.pull(MYDATA[srs, , drop = FALSE])
 
# 2. Systematic sampling
systematic <- round(seq(1, N, length.out = sampleSize))
systematicSample <- ore.pull(MYDATA[systematic, , drop = FALSE])
 
# 3. Stratified sampling
stratifiedSample <-
    do.call(rbind,
            lapply(split(MYDATA, MYDATA$group),
                   function(y)
                   {
                       ny <- nrow(y)
                       y[mysampler(sampleSize * ny/N, ny), , drop = FALSE]
                   }))
 
# 4. Cluster sampling
clusterSample <- do.call(rbind, sample(split(MYDATA, MYDATA$group), 2))
 
# 5a. Accidental/Convenience sampling (via row order access)
convenientSample1 <- head(MYDATA, sampleSize)
 
# 5b. Accidental/Convenience sampling (via hashing)
maxHash <- 2^32 # maximum allowed in ore.hash
convenient2 <- (ore.hash(rownames(MYDATA), maxHash)/maxHash) <= (sampleSize/N)
convenientSample2 <- ore.pull(MYDATA[convenient2, , drop = FALSE])
Random

Random Partitioning

For Oracle R Enterprise random partitions can be generated in the transparency layer by adding a partition or group column to an ore.frame object in the following manner:

nrowX <- nrow(x)
x$partition <- sample(rep(1:k, each = nrowX/k, length.out = nrowX), replace = TRUE)

After these partitions have been joined to the original data set, the ore.groupApply function can be used to perform the little bootstraps:

results <- ore.groupApply(x, x$partition, function(y) {...}, parallel = TRUE)

Persist and Manage R Objects in the Database

R objects exist for the duration of the current session, unless they are explicitly saved. For example, if you build a model in a particular R session, the model is not available when the session is closed, unless the model was explicitly saved.

Oracle R Enterprise supports persistence for R objects onto the database.

Persistence provides these advantages:

  • You can access the same R and Oracle R Enterprise object (for example, a model) among different R sessions.

  • You can build a model in R and use it for prediction and scoring in embedded Oracle R Enterprise.

Oracle R Enterprise creates datastores to contain persisted objects.

Persisted objects reside in a datastore. The following Oracle R Enterprise functionality allows you manage persistence:

ore.save()

ore.save() saves an R object or a list of R objects to the specified datastore in the connected database in the current user's schema:

ore.save({...}, list = character(0), name, envir = parent.frame(), overwrite = FALSE, append = FALSE, description = character(0)))

The parameters for ore.save() are as follows:

  • {...} is the list of R objects to save; the names of the objects to be saved (as symbols or character strings)

  • list is a character vector containing the names of objects to be saved

  • envir is the environment to search for objects to be saved

  • overwrite is a logical value specifying whether to overwrite the datastore if already exists; the default is FALSE (do not overwrite)

  • name is the name of the datastore; name must be specified

  • description is a comment describing the datastore

  • append is a logical value specifying whether to append objects to the datastore if already exists; the default is FALSE (do not append)

Examples of ore.save()

Save all objects in the current workspace environment to the datastore ds_1 in the user's current schema:

ore.save(list=ls(), name="ds_1", description = "example datastore")

Overwrite existing datastore ds_2 with objects x, y, and z in the current workspace environment:

ore.save(x, y, z, name="ds_2", overwrite=TRUE)

Add objects x, y, and z in the current workspace environment to the existing datastore ds_3 (that is append the objects to the datastore):

ore.save(x, y, z, name="ds_3", append=TRUE)

ore.load()

ore.load() loads all of the R objects stored in a specified datastore in the current user schema in the connected database to R:

ore.load(name, list = character(0), envir = parent.frame())

The parameters for ore.load() are

  • name is a character string specifying the name of datastore to load the objects from; you must specify a name

  • list is a character vector containing the names of objects to be loaded

  • envir is the R environment that objects are loaded to

ore.load() returns a character vector containing the names of objects loaded from the datastore.

Examples of ore.load()

Load all objects in the datastore ds_1:

ore.load("ds_1")

Load just the objects x, y, and z from datastore ds_1:

ore.load("ds_1", list=c("x", "Y", "z"))

ore.delete()

ore.delete() deletes the specified datastore (and all of the R objects in it) from the current user schema in the connected database:

ore.delete(name)

The parameter for ore.delete() is

  • name is a character string specifying the name of datastore to delete; you must specify a name

Use ore.datastore() to list the datastores that exist in the user's Oracle Database schema.

Example of ore.delete()

Delete the datastore ds_1 from the user's current schema:

ore.delete("ds_1")

ore.datastore()

ore.datastore() lists the datastores and basic information about each datastore in the current schema:

ore.datastore(name, pattern)

The parameters for ore.datastore() are

  • name is a character string specifying the name of datastore to list

  • pattern is a regular expression character string specifying the names of the datastores to list.

ore.datastore() lists information about the datastore with name specified in name or information about the datastores whose names match the regular expression specified in pattern.

If neither name nor pattern is provided, ore.datastore() returns information about all datastores in user's schema.

Either name or pattern can be specified but not both.

ore.datastore() returns a data.frame object with these columns:

  • datastore.name name of the datastore

  • object.count number of objects in the datastore identified by datastore.name

  • size size of the datastore in bytes

  • creation.date date of datastore creation

  • description comment for datastore (comment is specified in the description parameter of ore.save)

Each row of the data.frame lists one datastore. Rows are sorted by column datastore.name in alphabetical order.

Example of ore.datastore()

List all of the datastores in the connected schema:

ore.datastore()

ore.datastoreSummary()

ore.datastoreSummary() returns a data.frame that lists the names and summary information for the R objects saved in the specified datastore in the schema in the connected database:

ore.datastoreSummary(name)

The parameter for ore.datastoreSummary() is

  • name is a character string specifying the name of datastore to summarize; you must specify a name

If the specified datastore does not exist, an error is returned.

ore.datastoreSummary() returns a data.frame object with these columns:

  • object.name is the name of the R object

  • class.name is the class name of the R object

  • size is the size of the R object in bytes

  • length is the length of the R object

  • row.count is the number of rows for the R object

  • col.count is number of columns of the R object

Each row of the data.frame lists one R object. Rows are sorted by column datastore.name in alphabetical order.

Example of ore.datastoreSummary()

List summary information for all of the R objects in the datastore ds_1:

ore.datastoreSummary(name = "ds_1")

Using R with Oracle R Enterprise Data Types

The following examples illustrate using R with Oracle R Enterprise data types:

  • Simple column and row selection in R:

    # Push built-in R data set iris to database
    R> ore.create(iris, table="IRIS")
    R> head(iris)
      Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1          5.1         3.5          1.4         0.2  setosa
    2          4.9         3.0          1.4         0.2  setosa
    3          4.7         3.2          1.3         0.2  setosa
    4          4.6         3.1          1.5         0.2  setosa
    5          5.0         3.6          1.4         0.2  setosa
    6          5.4         3.9          1.7         0.4  setosa
    R> iris_projected = IRIS[, c("PETAL_LENGTH", "SPECIES")]
    R> head (iris_projected)
      PETAL_LENGTH SPECIES
    0          1.4  setosa
    1          1.4  setosa
    2          1.3  setosa
    3          1.5  setosa
    4          1.4  setosa
    5          1.7  setosa
    
  • Database JOIN using R:

    df1 <- data.frame(x1=1:5, y1=letters[1:5])
    df2 <- data.frame(x2=5:1, y2=letters[11:15])
    merge (df1, df2, by.x="x1", by.y="x2")
     x1 y1 y2
    1  1  a  o
    2  2  b  n
    3  3  c  m
    4  4  d  l
    5  5  e  k
    # Create database objects to correspond to in-memory R objects df1 and df2
    ore.df1 <- ore.create(df1, table="DF1")
    ore.df2 <- ore.create(df2, table="DF2")
    # Compare results
    R> merge (DF1, DF2, by.x="X1", by.y="X2")
      X1 Y1 Y2
    0  1  a  o
    1  2  b  n
    2  3  c  m
    3  4  d  l
    4  5  e  k
    
  • Database aggregation using R:

    # Push built-in data set iris to database
    ore.create(iris, table="IRIS")
    aggdata <- aggregate(IRIS, by = list(IRIS$SPECIES), FUN = summary)
    class(aggdata)
    head(aggdata)
    
  • Data formatting and creating derived columns in R

    Note that adding derived columns does not change the database table. See Derived Columns in Oracle R Enterprise.

    diverted_fmt <- function (x) {
    ifelse(x==0, 'Not Diverted',
    ifelse(x==1, 'Diverted',''))
    }
    cancellationCode_fmt <- function(x) {
    ifelse(x=='A', 'A CODE',
    ifelse(x=='B', 'B CODE',
    ifelse(x=='C', 'C CODE',
    ifelse(x=='D', 'D CODE', 'NOT CANCELLED'))))
    }
    delayCategory_fmt <- function(x) {
    ifelse(x>200,'LARGE',
    ifelse(x>=30,'MEDIUM','SMALL'))
    }
    zscore <- function(x) {
    (x-mean(x,na.rm=TRUE))/sd(x,na.rm=TRUE)
    # ONTIME_S is a database table
    ONTIME_S$DIVERTED <- diverted_fmt(DIVERTED)
    ONTIME_S$CANCELLATIONCODE <- cancellationCode_fmt(CANCELLATIONCODE)
    ONTIME_S$ARRDELAY <- delayCategory_fmt(ARRDELAY)
    ONTIME_S$DEPDELAY <- delayCategory_fmt(DEPDELAY)
    ONTIME_S$DISTANCE_ZSCORE <- zscore(DISTANCE)
    

Derived Columns in Oracle R Enterprise

When you add derived columns using Oracle R Enterprise, the derived columns do not affect the underlying table in the database. A SQL query is generated that has the additional derived columns in the select list, but the table is not changed.

Using CRAN Packages with Oracle R Enterprise

This example illustrates using Oracle R Enterprise with a standard R package downloaded from CRAN:

Build and Use a Regression Model

This example illustrates building a regression model using a CRAN package. You can prepare the data used for training in the database (filtering out observations that are not of interest, selecting attributes, imputing missing values, and so forth). Suppose that the preprocessed data is in the table ONTIME_S_PREPROCESSED_SUBSET. Then pull the prepared training set (which is usually small enough to fit in desktop R memory) into the R client to execute the model build.

You can use the resulting model to score (predict) large numbers of rows, in parallel, in Oracle Database. The data are stored in ONTIME_S_FINAL_DATA_TO_BE_SCORED.

Note that scoring is a trivially parallelizable operation because one row can be scored independent of and in parallel with another row. The model built on the desktop is shipped to the database to perform scoring on vast numbers of rows in the database.

The computations are divided into these steps:

  1. Build a model in the desktop:

    dat <- ore.pull(ONTIME_S_PREPROCESSED_SUBSET)
    mod <- glm(ARRDELAY ~ DISTANCE + DEPDELAY, dat)
    mod
    summary(mod)
    
  2. Score in-parallel in the database using embedded R:

    prd <- predict(mod, newdata=ONTIME_S_FINAL_DATA_TO_BE_SCORED)
    class(prd)
    # Add predictions as a new column
    res <- cbind(newdat, PRED = prd)
    head(res)
    

R provides many other ways to build regression models, such as lm().

For other ways to build regression models, see Oracle R Enterprise Versions of R Models and In-Database Predictive Models in Oracle R Enterprise.

Oracle R Enterprise Database-Embedded R Engine

The embedded R engine in Oracle Database allows R users to off load desktop calculations that may require either more resources such as those available to Oracle Database or database-driven data parallelism. The embedded R engine also executes R scripts embedded in SQL or PL/SQL programs (lights-out processing).

These examples illustrate using Oracle R Enterprise embedded R engine with standard R packages downloaded from CRAN:

Perform R Computation in Oracle Database

This example illustrates off loading R computation to execute in the embedded R engine. To off load an R computation, simply include the R code within a closure (that is, function() {}) and invoke ore.doEval(). ore.doEval() schedules execution of the R code with the database-embedded R engine and returns the results back to the desktop for continued analysis:

library(biglm)
mod <- ore.doEval(
   function() {
      library(biglm)
      dat <- ore.pull(ore.get("ONTIME_S"))
      mod <- biglm(ARRDELAY ~ DISTANCE + DEPDELAY, dat)
      mod
     }, ore.connect = TRUE);
print(mod)
mod=ore.pull(mod)
print(mod)

Build a Series of Regression Models Using Data Parallelism

This example illustrates database-driven data parallelism at work in building a series of regression models using a CRAN package. One model is built per unique value of a factor. The database orchestrates the parallel and concurrent building of the models, one per factor and brings the list of all models built to the user desktop for further analysis:

modList <- ore.groupApply(
  # Organize input to the R script – This is always an Oracle R Enterprise   
  # data frame
   X=ONTIME_S,
  # Specify the grouping column. Here we request one model per unique value of
  # ONTIME_S$DEST
   INDEX=ONTIME_S$DEST,
  # Model building code goes inside the closure. Input and grouping 
  # conditions can be referenced as parameters to the function
    function(x) {
     library(biglm)
     biglm(ARRDELAY ~ DISTANCE + DEPDELAY, x)
     });

   modList_local <- ore.pull(modList)
# Print the model for just one destination - BOSTON
summary(modList_local$BOS)

Oracle R Enterprise Examples

Oracle R Enterprise is shipped with a collection of demos, examples that illustrate how to use Oracle R Enterprise. These examples are a collection of self-contained R scripts.

Most of the sample programs use the data frame iris, which is included in the R distribution. iris is loaded into a table as described in Load a Data Frame to a Table.

The rest of this section describes two examples in detail and includes a list of all of the examples:

Load a Data Frame to a Table

Start R, load the ORE packages via library(ORE), and then connect to the database.

Follow these steps to load an R data frame to a database table:

  1. This example uses the R data set iris.

    The iris data set is located in the datasets package that is part of the R distribution:

    R> find("iris")
    [1] "package:datasets"
    

    Use the R command class to verify that iris is an R data frame:

    R> class(iris)
    [1] "data.frame"
    

    iris consist of measurements of parts of iris flowers. Use the R command head to see a small sample of the data in iris.

    R> head(iris)
          Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1          5.1         3.5          1.4         0.2  setosa
    2          4.9         3.0          1.4         0.2  setosa
    3          4.7         3.2          1.3         0.2  setosa
    4          4.6         3.1          1.5         0.2  setosa
    5          5.0         3.6          1.4         0.2  setosa
    6          5.4         3.9          1.7         0.4  setosa
    
  2. Now load the data frame iris into the database that you are connected to.

    Suppose that the database table version of iris is named IRIS_TABLE. Drop IRIS_TABLE to make sure that no table of this name exists in the connected schema:

    ore.drop(table = "IRIS_TABLE")
    

    If IRIS_TABLE doesn't exist, you do not get a message.

  3. Now create a database table with the data contained in iris:

    ore.create(iris, table = "IRIS_TABLE")
    

    Use ore.ls() to verify that the table was created:

    R> ore.ls()
    [1] "IRIS_TABLE" "NARROW"     "ONTIME_S" 
    
  4. IRIS_TABLE is a database-resident table with just metadata on the R side:

    R> class(IRIS_TABLE)
    [1] "ore.frame"
    attr(,"package")
    [1] "OREbase"
    
  5. Use head to see the column names and the first few values in IRIS_TABLE:

    R> head(IRIS_TABLE)
      SEPAL_LENGTH SEPAL_WIDTH PETAL_LENGTH PETAL_WIDTH SPECIES
    0          5.1         3.5          1.4         0.2  setosa
    1          4.9         3.0          1.4         0.2  setosa
    2          4.7         3.2          1.3         0.2  setosa
    3          4.6         3.1          1.5         0.2  setosa
    4          5.0         3.6          1.4         0.2  setosa
    5          5.4         3.9          1.7         0.4  setosa
    
  6. Use class to see the data type of the column SPECIES.

    R > class(IRIS_TABLE$SPECIES)
    [1] "raw"
    

You can use R functions to analyze the data in the table. Here are some simple examples taken from the example basic.R:

  • Use unique to get a list of the unique entries in a column. This example finds the unique SPECIES:

    R> unique(IRIS_TABLE$SPECIES)
    [1] setosa     versicolor virginica 
    Levels: setosa versicolor virginica
    
  • Find the minimum, maximum, and mean of PETAL_LENGTH:

    R> min(IRIS_TABLE$PETAL_LENGTH)
    [1] 1
    R> max(IRIS_TABLE$PETAL_LENGTH)
    [1] 6.9
    R> mean(IRIS_TABLE$PETAL_LENGTH)
    [1] 3.758
    

If you need information about an R function, use the command help(function-name).

Handle NULL Values Using airquality

Oracle Database has logical values: TRUE, FALSE, NULL. There is a 3x3 table that defines truth values for propositions with AND and OR. NULL is treated as an unknown value. For some operations the result is either deterministic (for example TRUE OR NULL) or unknown (TRUE AND NULL). If logical values are used in a WHERE clause, only rows with the condition TRUE are selected; FALSE and NULL are ignored. R, on the other hand, keeps TRUE and NA. Rows with NA are selected with value NA.

The option ore.na.extract controls whether NAs are selected or not. The default is to treat NA as SQL treats FALSE.

The demo nulls.R is the only sample that does not use iris as data. nulls.R compares the handling of NULLs in SQL with the handling of NAs in R.

In R, NA is a logical constant of length 1 which contains a missing value indicator. In the database, null refers to the absence of a value in a column of a row. Nulls indicate missing, unknown, or inapplicable data.

Follow these steps to understand the demo nulls.R:

  1. This demo uses the data frame airquality. Verify that the data set is a data frame and look at the few rows of the data frame:

    R> class(airquality)
    [1] "data.frame"
    R> head(airquality)
      Ozone Solar.R Wind Temp Month Day
    1    41     190  7.4   67     5   1
    2    36     118  8.0   72     5   2
    3    12     149 12.6   74     5   3
    4    18     313 11.5   62     5   4
    5    NA      NA 14.3   56     5   5
    6    28      NA 14.9   66     5   6
    
  2. Load airquality into the database as "AIRQUALITY":

    ore.drop(table = "AIRQUALITY")
    ore.create(airquality, table = "AIRQUALITY")
    

    Use ore.ls() to verify that the table was created. If you wish, use class(AIRQUALITY) to verify that AIRQUALITY is a database-resident table with just metadata on the R side.

  3. Examine how R handles NAs. Return all observations where ozone < 30 is TRUE:

    R> nrow(airquality[airquality$Ozone < 30,])[1] 92
    

    Compare this with the results when NAs are explicitly excluded:

    R> nrow(airquality[airquality$Ozone < 30 & !is.na(airquality$Ozone),])
    [1] 55
    
  4. The default behavior for SQL tables is to exclude NULLS in output:

    nrow(AIRQUALITY[AIRQUALITY$OZONE < 30,])
    [1] 55
    

    To handle NULLs the same way that R handles NA, request the behavior explicitly:

    options(ore.na.extract = TRUE)
    nrow(AIRQUALITY[AIRQUALITY$OZONE < 30,])
    [1] 92
    

Oracle R Enterprise Demos

These scripts have been added as demos to the ORE package.

To list all of the demos included with Oracle R Enterprise, type

R> demo(package = "ORE")

To run one of these scripts, specify the name of the demo in a demo function call. For example, to run aggregate.R, type

R> demo("aggregate", package = "ORE")

These demos are shipped with Oracle R Enterprise:

aggregate      Aggregation
analysis       Basic analysis & data processing operations
basic          Basic connectivity to database
binning        Binning logic
columnfns      Column functions
cor            Correlation matrix
crosstab       Frequency cross tabulations
datastore      DataStore operations
datetime       Date/Time operations
derived        Handling of derived columns
distributions  Distribution, density, and quantile functions
do_eval        Embedded R processing
freqanalysis   Frequency cross tabulations
graphics       Demonstrates visual analysis
group_apply    Embedded R processing by group
hypothesis     Hyphothesis testing functions
matrix         Matrix related operations
nulls          Handling of NULL in SQL vs. NA in R
odm_ai         Oracle Data Mining: attribute importance
odm_dt         Oracle Data Mining: decision trees
odm_glm        Oracle Data Mining: generalized linear models
odm_kmeans     Oracle Data Mining: enhanced k-means clustering
odm_nb         Oracle Data Mining: naive Bayes classification
odm_svm        Oracle Data Mining: support vector machines
push_pull      RDBMS <-> R data transfer
rank           Attributed-based ranking of observations
reg            Ordinary least squares linear regression
row_apply      Embedded R processing by row chunks
sampling       Random row sampling and partitioning of an ore.frame
sql_like       Mapping of R to SQL commands
stepwise       Stepwise OLS linear regression
summary        Summary functionality
table_apply    Embedded R processing of entire table