The ore.indexApply
function executes the specified user-defined input function using data that is generated by the input function. It supports task-parallel execution, in which one or more R engines perform the same or different calculations, or task. The times
argument to the ore.indexApply
function specifies the number of times that the input function executes in the database. Any required data must be explicitly generated or loaded within the input function.
The syntax of the ore.indexApply
function is the following:
ore.indexApply(times, FUN, ..., FUN.VALUE = NULL, FUN.NAME = NULL, FUN.OWNER = NULL, parallel = getOption("ore.parallel", NULL))
The ore.indexApply
function returns an ore.list
object or an ore.frame
object.
Examples of the use of the ore.indexApply
function are in the following topics:
See Also:
"Arguments for Functions that Run Scripts" for descriptions of the arguments to function ore.indexApply
Example 6-17 invokes ore.indexApply
and specifies that it execute the input function five times in parallel. It displays the class of the result, which is ore.list
, and then displays the result.
Example 6-17 Using the ore.indexApply Function
res <- ore.indexApply(5, function(index) { paste("IndexApply:", index) }, parallel = TRUE) class(res) resListing for Example 6-17
R> res <- ore.indexApply(5, + function(index) { + paste("IndexApply:", index) + }, + parallel = TRUE) R> class(res) [1] "ore.list" attr(,"package") [1] "OREembed" R> res $`1` [1] "IndexApply: 1" $`2` [1] "IndexApply: 2" $`3` [1] "IndexApply: 3" $`4` [1] "IndexApply: 4" $`5` [1] "IndexApply: 5"
Example 6-18 uses the R summary
function to compute in parallel summary statistics on the first four numeric columns of the iris
data set. The example combines the computations into a final result. The first argument to the ore.indexApply
function is 4, which specifies the number of columns to summarize in parallel. The user-defined input function takes one argument, index
, which will be a value between 1 and 4 and which specifies the column to summarize.
The example invokes the summary
function on the specified column. The summary
invocation returns a single row, which contains the summary statistics for the column. The example converts the result of the summary
invocation into a data.frame
and adds the column name to it.
The example next uses the FUN.VALUE
argument to the ore.indexApply
function to define the structure of the result of the function. The result is then returned as an ore.frame
object with that structure.
Example 6-18 Using the ore.indexApply Function and Combining Results
res <- NULL res <- ore.indexApply(4, function(index) { ss <- summary(iris[, index]) attr.names <- attr(ss, "names") stats <- data.frame(matrix(ss, 1, length(ss))) names(stats) <- attr.names stats$col <- names(iris)[index] stats }, FUN.VALUE=data.frame(Min. = numeric(0), "1st Qu." = numeric(0), Median = numeric(0), Mean = numeric(0), "3rd Qu." = numeric(0), Max. = numeric(0), Col = character(0)), parallel = TRUE) resListing for Example 6-18
R> res <- NULL R> res <- ore.indexApply(4, + function(index) { + ss <- summary(iris[, index]) + attr.names <- attr(ss, "names") + stats <- data.frame(matrix(ss, 1, length(ss))) + names(stats) <- attr.names + stats$col <- names(iris)[index] + stats + }, + FUN.VALUE=data.frame(Min. = numeric(0), + "1st Qu." = numeric(0), + Median = numeric(0), + Mean = numeric(0), + "3rd Qu." = numeric(0), + Max. = numeric(0), + Col = character(0)), + parallel = TRUE) R> res Min. X1st.Qu. Median Mean X3rd.Qu. Max. Col 1 2.0 2.8 3.00 3.057 3.3 4.4 Sepal.Width 2 4.3 5.1 5.80 5.843 6.4 7.9 Sepal.Length 3 0.1 0.3 1.30 1.199 1.8 2.5 Petal.Width 4 1.0 1.6 4.35 3.758 5.1 6.9 Petal.Length Warning message: ORE object has no unique key - using random order
You can use the ore.indexApply
function in simulations, which can take advantage of high-performance computing hardware like an Oracle Exadata Database Machine. Example 6-19 takes multiple samples from a random normal distribution to compare the distribution of the summary statistics. Each simulation occurs in a separate R engine in the database, in parallel, up to the degree of parallelism allowed by the database.
Example 6-19 defines variables for the sample size, the mean and standard deviations of the random numbers, and the number of simulations to perform. The example specifies num.simulations
as the first argument to the ore.indexApply
function. The ore.indexApply
function passes num.simulations
to the user-defined function as the index
argument. This input function then sets the random seed based on the index so that each invocation of the input function generates a different set of random numbers.
The input function next uses the rnorm
function to produce sample.size
random normal values. It invokes the summary
function on the vector of random numbers, and then prepares a data.frame
as the result it returns. The ore.indexApply
function specifies the FUN.VALUE
argument so that it returns an ore.frame
that structures the combined results of the simulations. The res
variable gets the ore.frame
returned by the ore.indexApply
function.
To get the distribution of samples, the example invokes the boxplot
function on the data.frame
that is the result of using the ore.pull
function to bring selected columns from res
to the client.
Example 6-19 Using the ore.indexApply Function in a Simulation
res <- NULL sample.size = 1000 mean.val = 100 std.dev.val = 10 num.simulations = 1000 res <- ore.indexApply(num.simulations, function(index, sample.size = 1000, mean = 0, std.dev = 1) { set.seed(index) x <- rnorm(sample.size, mean, std.dev) ss <- summary(x) attr.names <- attr(ss, "names") stats <- data.frame(matrix(ss, 1, length(ss))) names(stats) <- attr.names stats$index <- index stats }, FUN.VALUE=data.frame(Min. = numeric(0), "1st Qu." = numeric(0), Median = numeric(0), Mean = numeric(0), "3rd Qu." = numeric(0), Max. = numeric(0), Index = numeric(0)), parallel = TRUE, sample.size = sample.size, mean = mean.val, std.dev = std.dev.val) options("ore.warn.order" = FALSE) head(res, 3) tail(res, 3) boxplot(ore.pull(res[, 1:6]), main=sprintf("Boxplot of %d rnorm samples size %d, mean=%d, sd=%d", num.simulations, sample.size, mean.val, std.dev.val))Listing for Example 6-19
R> res <- ore.indexApply(num.simulations, + function(index, sample.size = 1000, mean = 0, std.dev = 1) { + set.seed(index) + x <- rnorm(sample.size, mean, std.dev) + ss <- summary(x) + attr.names <- attr(ss, "names") + stats <- data.frame(matrix(ss, 1, length(ss))) + names(stats) <- attr.names + stats$index <- index + stats + }, + FUN.VALUE=data.frame(Min. = numeric(0), + "1st Qu." = numeric(0), + Median = numeric(0), + Mean = numeric(0), + "3rd Qu." = numeric(0), + Max. = numeric(0), + Index = numeric(0)), + parallel = TRUE, + sample.size = sample.size, + mean = mean.val, std.dev = std.dev.val) R> options("ore.warn.order" = FALSE) R> head(res, 3) Min. X1st.Qu. Median Mean X3rd.Qu. Max. Index 1 67.56 93.11 99.42 99.30 105.8 128.0 847 2 67.73 94.19 99.86 100.10 106.3 130.7 258 3 65.58 93.15 99.78 99.82 106.2 134.3 264 R> tail(res, 3) Min. X1st.Qu. Median Mean X3rd.Qu. Max. Index 1 65.02 93.44 100.2 100.20 106.9 134.0 5 2 71.60 93.34 99.6 99.66 106.4 131.7 4 3 69.44 93.15 100.3 100.10 106.8 135.2 3 R> boxplot(ore.pull(res[, 1:6]), + main=sprintf("Boxplot of %d rnorm samples size %d, mean=%d, sd=%d", + num.simulations, sample.size, mean.val, std.dev.val))
Figure 6-2 Display of the boxplot Function in Example 6-19