2 Oracle R Enterprise Transparency Layer

Oracle R Enterprise Transparency Layer performs these functions:

  • Traps all R commands and scripts prior to execution and looks for opportunities to ship them to Oracle Database for execution in the database.

  • Enables transparent grandparent SQL generation for R expressions that use mapped data types.

  • Converts R commands and scripts to SQL equivalents to leverage Oracle Database as a high-performance compute engine, taking advantage of query optimization, tables indexes, deferred evaluation, and parallel execution.

The Oracle R Enterprise transparency layer allows R users to use R syntax to work directly with database-resident objects without having to pull data from Oracle into R's memory on the user's desktop. It thus enables R users to work with data larger than desktop memory allows.

R language constructs and syntax are supported for objects mapped to Oracle Database objects.

This chapter summarizes the functionality provided by the Transparency Layer. These topics are discussed:

Data Types Supported

The following R data types have been overloaded so that they are mapped to database objects and hence enabled for in-database execution:

  • Character, Integer, Numeric, and Logical vectors

  • Date and Time Data Types

  • Factors

  • Data Frame

  • Matrix is overloaded in two situations:

    • Linear algebra cross-products

    • Creating input matrices for advanced analytics

class(object) reports the data type of such mapped objects. For example, if the table NARROW contains the column AGE and AGE is numeric,

R> class(NARROW$AGE)
[1] "ore.numeric"
attr(,"package")
[1] "OREbase"

Date and Time Data Types

This section describes how Oracle database supports Date and Time Data Types and illustrates how to use these data types in Oracle R Enterprise.

Date and Time Data Types in Oracle

Oracle Database supports these data and time data types:

  • The DATE data type stores date and time information. For each DATE value, Oracle stores the following fields: YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND.

    The valid date range is January 1, 4712 BC, to December 31, 9999 AD.

  • The TIMESTAMP data type is an extension of the DATE data type. It stores the year, month, and day of the DATE data type, plus hour, minute, and second values. Supports an optional fractional_seconds_precision, the number of digits in the fractional part of the SECOND field in DATE. You can specify 0 to 9 digits; the default is 6 digits.

    There are two extensions of TIMESTAMP:

    • TIMESTAMP WITH TIME ZONE is TIMESTAMP as well as time zone displacement value TIMEZONE_HOUR and TIMEZONE_MINUTE.

    • TIMESTAMP WITH LOCAL TIME ZONE is TIMESTAMP WITH TIME ZONE with data normalized to the database time zone when it is stored in the database. When the data is retrieved, users see the data in the session time zone.

  • INTERVAL YEAR TO MONTH stores a period of time using the YEAR and MONTH fields. This data type is useful for representing the difference between two data time values when only the year and month values are significant.

  • INTERVAL DAY TO SECOND stores a period of time in terms of days, hours, minutes, and seconds. This data type is useful for representing the precise difference between two date time values.

  • INTERVAL YEAR TO MONTH stores a period of time in years and months, where optional year_precision, which is the number of digits in the YEAR date time field. Accepted values are 0 to 9.

  • INTERVAL DAY TO SECOND stores a period of time in days, hours, minutes, and seconds. Supports an optional day_precision, the maximum number of digits in the DAY date time field (value is 0 to 9 with a default of 2.) Also supports optional fractional_seconds_precision, the number of digits in the fractional part of the SECOND field. (value 0 to 9 with a default of 6).

For detailed information about Oracle Data Types, see “Data Types” in Oracle Database SQL Language Reference.

You can perform all expected operations on dates.

Oracle R Enterprise Support for Date and Time

Oracle R Enterprise provides these classes to support date and time calculations:

  • ore.date (Oracle DATE)

  • ore.datetime (TIMESTAMP, TIMESTAMP WITH TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE)

  • ore.difftime (INTERVAL DAY TO SECOND)

Note that ore.datetime objects do not support a time zone setting, instead they use the system time zone Sys.timezone() if it is available or GMT if Sys.timezone() is not available.

Operators and Functions Supported

Oracle R Enterprise supports data pre-processing functionality extensively so all data preparation and analysis can take place directly in the database.

You are not restricted to using this list of functions. If a specific function that you need is not supported by Oracle R Enterprise, you can pull data from the database into the R engine memory using ore.pull() to create an in-memory R object first, and use any R function.

The following operators and functions are supported. See R documentation for syntax and semantics of these operators and functions. Syntax and semantics for these items are unchanged when used on a corresponding database-mapped data type (also known as an Oracle R Enterprise data type).

  • Mathematical transformations: abs, sign, sqrt, ceiling, floor, trunc, cummax, cummin, cumprod, cumsum, log, loglo, log10, log2, log1p, acos, acosh, asin, asinh, atan, atanh, exp, expm1, cos, cosh, sin, sinh, tan, atan2, tanh, gamma, lgamma, digamma, trigamma, factorial, lfactorial, round, signif, pmin, pmax, zapsmall, rank, diff, besselI, besselJ, besselK, besselY

  • Basic statistics: mean, summary, min, max, sum, any, all, median, range, IQR, fivenum, mad, quantile, sd, var, table, tabulate, rowSums, colSums, rowMeans, colMeans, cor, cov

  • Arithmetic operators: +, -, *, /, ^, %%, %/%

  • Comparison operators: ==, >, <, !=, <=, >=

  • Logical operators: &, |, xor

  • Set operations: unique, %in%, subset

  • String operations: tolower, toupper, casefold, toString, chartr, sub, gsub, substr, substring, paste, nchar, grepl

  • Combine Data Frame: cbind, rbind, merge

  • Combine vectors: append

  • Vector creation: ifelse

  • Subset selection: [, [[, $, head, tail, window, subset, Filter, na.omit, na.exclude, complete.cases

  • Subset replacement: [<-, [[<-, $<-

  • Data reshaping: split, unlist

  • Data processing: eval, with, within, transform

  • Apply variants: tapply, aggregate, by

  • Special value checks: is.na, is.finite, is.infinite, is.nan

  • Metadata functions: nrow, NROW, ncol, NCOL, nlevels, names, names<-, row, col, dimnames, dimnames<-, dim, length, row.names, row.names<-, rownames, rownames<-, colnames, levels, reorder

  • Graphics: arrows, boxplot, cdplot, co.intervals, coplot, hist, identify, lines, matlines, matplot, matpoints, pairs, plot, points, polygon, polypath, rug, segments, smoothScatter, sunflowerplot, symbols, text, xspline, xy.coords

  • Conversion functions: as.logical, as.integer, as.numeric, as.character, as.vector, as.factor, as.data.frame

  • Type check functions: is.logical, is.integer, is.numeric, is.character, is.vector, is.factor, is.data.frame

  • Character manipulation: nchar, tolower, toupper, casefold, chartr, sub, gsub, substr.

  • Other ore.frame functions: data.frame, max.col, scale

  • Hypothesis testing: binom.test, chisq.test, ks.test, prop.test, t.test, var.test, wilcox.test

  • Various Distributions: Density, cumulative distribution, and quantile functions for standard distributions

  • ore.matrix function: show, is.matrix, as.matrix, %*% (matrix multiplication), t, crossprod (matrix cross-product), tcrossprod (matrix cross-product A times transpose of B), solve (invert), backsolve, forwardsolve, all appropriate mathematical functions (abs, sign, and so on), summary (max, min, all, and so on), mean

The Oracle R Enterprise sample programs described in Oracle R Enterprise Examples include several examples using each category of these functions with Oracle R Enterprise data types.