6.2.7 Using the ore.indexApply Function

The ore.indexApply function executes the specified user-defined input function using data that is generated by the input function. It supports task-parallel execution, in which one or more R engines perform the same or different calculations, or task. The times argument to the ore.indexApply function specifies the number of times that the input function executes in the database. Any required data must be explicitly generated or loaded within the input function.

The syntax of the ore.indexApply function is the following:

ore.indexApply(times, FUN, ..., FUN.VALUE = NULL, FUN.NAME = NULL, FUN.OWNER = NULL,
               parallel = getOption("ore.parallel", NULL))

The ore.indexApply function returns an ore.list object or an ore.frame object.

Examples of the use of the ore.indexApply function are in the following topics:

6.2.7.1 Simple Example of Using the ore.indexApply Function

Example 6-17 invokes ore.indexApply and specifies that it execute the input function five times in parallel. It displays the class of the result, which is ore.list, and then displays the result.

Example 6-17 Using the ore.indexApply Function

res <- ore.indexApply(5,
      function(index) {
        paste("IndexApply:", index)
      },
      parallel = TRUE)
class(res)
res
Listing for Example 6-17
R> res <- ore.indexApply(5,
+       function(index) {
+         paste("IndexApply:", index)
+       },
+       parallel = TRUE)
R> class(res)
[1] "ore.list"
attr(,"package")
[1] "OREembed"
R> res
$`1`
[1] "IndexApply: 1"
 
$`2`
[1] "IndexApply: 2"
 
$`3`
[1] "IndexApply: 3"
 
$`4`
[1] "IndexApply: 4"
 
$`5`
[1] "IndexApply: 5"

6.2.7.2 Column-Parallel Use Case

Example 6-18 uses the R summary function to compute in parallel summary statistics on the first four numeric columns of the iris data set. The example combines the computations into a final result. The first argument to the ore.indexApply function is 4, which specifies the number of columns to summarize in parallel. The user-defined input function takes one argument, index, which will be a value between 1 and 4 and which specifies the column to summarize.

The example invokes the summary function on the specified column. The summary invocation returns a single row, which contains the summary statistics for the column. The example converts the result of the summary invocation into a data.frame and adds the column name to it.

The example next uses the FUN.VALUE argument to the ore.indexApply function to define the structure of the result of the function. The result is then returned as an ore.frame object with that structure.

Example 6-18 Using the ore.indexApply Function and Combining Results

res <- NULL
res <- ore.indexApply(4,
      function(index) {
        ss <- summary(iris[, index])
        attr.names <- attr(ss, "names")
        stats <- data.frame(matrix(ss, 1, length(ss)))
        names(stats) <- attr.names
        stats$col <- names(iris)[index]
        stats
      },
      FUN.VALUE=data.frame(Min. = numeric(0),
        "1st Qu." = numeric(0),
        Median = numeric(0),
        Mean = numeric(0),
        "3rd Qu." = numeric(0),
        Max. = numeric(0),
        Col = character(0)), 
      parallel = TRUE)
res
Listing for Example 6-18
R> res <- NULL
R> res <- ore.indexApply(4,
+       function(index) {
+         ss <- summary(iris[, index])
+         attr.names <- attr(ss, "names")
+         stats <- data.frame(matrix(ss, 1, length(ss)))
+         names(stats) <- attr.names
+         stats$col <- names(iris)[index]
+         stats
+       },
+       FUN.VALUE=data.frame(Min. = numeric(0),
+         "1st Qu." = numeric(0),
+         Median = numeric(0),
+         Mean = numeric(0),
+         "3rd Qu." = numeric(0),
+         Max. = numeric(0),
+         Col = character(0)),
+       parallel = TRUE)
R> res
  Min. X1st.Qu. Median  Mean X3rd.Qu. Max.          Col
1  2.0      2.8   3.00 3.057      3.3  4.4  Sepal.Width
2  4.3      5.1   5.80 5.843      6.4  7.9 Sepal.Length
3  0.1      0.3   1.30 1.199      1.8  2.5  Petal.Width
4  1.0      1.6   4.35 3.758      5.1  6.9 Petal.Length
Warning message:
ORE object has no unique key - using random order

6.2.7.3 Simulations Use Case

You can use the ore.indexApply function in simulations, which can take advantage of high-performance computing hardware like an Oracle Exadata Database Machine. Example 6-19 takes multiple samples from a random normal distribution to compare the distribution of the summary statistics. Each simulation occurs in a separate R engine in the database, in parallel, up to the degree of parallelism allowed by the database.

Example 6-19 defines variables for the sample size, the mean and standard deviations of the random numbers, and the number of simulations to perform. The example specifies num.simulations as the first argument to the ore.indexApply function. The ore.indexApply function passes num.simulations to the user-defined function as the index argument. This input function then sets the random seed based on the index so that each invocation of the input function generates a different set of random numbers.

The input function next uses the rnorm function to produce sample.size random normal values. It invokes the summary function on the vector of random numbers, and then prepares a data.frame as the result it returns. The ore.indexApply function specifies the FUN.VALUE argument so that it returns an ore.frame that structures the combined results of the simulations. The res variable gets the ore.frame returned by the ore.indexApply function.

To get the distribution of samples, the example invokes the boxplot function on the data.frame that is the result of using the ore.pull function to bring selected columns from res to the client.

Example 6-19 Using the ore.indexApply Function in a Simulation

res <- NULL
sample.size = 1000
mean.val = 100
std.dev.val = 10
num.simulations = 1000
 
res <- ore.indexApply(num.simulations,
      function(index, sample.size = 1000, mean = 0, std.dev = 1) {
        set.seed(index)
        x <- rnorm(sample.size, mean, std.dev)
        ss <- summary(x)
        attr.names <- attr(ss, "names")
        stats <- data.frame(matrix(ss, 1, length(ss)))
        names(stats) <- attr.names
        stats$index <- index
        stats
      },
      FUN.VALUE=data.frame(Min. = numeric(0),
        "1st Qu." = numeric(0),
        Median = numeric(0),
        Mean = numeric(0),
        "3rd Qu." = numeric(0),
        Max. = numeric(0),
        Index = numeric(0)),
      parallel = TRUE,
      sample.size = sample.size,
      mean = mean.val, std.dev = std.dev.val)
options("ore.warn.order" = FALSE)
head(res, 3)
tail(res, 3)
boxplot(ore.pull(res[, 1:6]),
  main=sprintf("Boxplot of %d rnorm samples size %d, mean=%d, sd=%d",
               num.simulations, sample.size, mean.val, std.dev.val))
Listing for Example 6-19
R> res <- ore.indexApply(num.simulations,
+       function(index, sample.size = 1000, mean = 0, std.dev = 1) {
+         set.seed(index)
+         x <- rnorm(sample.size, mean, std.dev)
+         ss <- summary(x)
+         attr.names <- attr(ss, "names")
+         stats <- data.frame(matrix(ss, 1, length(ss)))
+         names(stats) <- attr.names
+         stats$index <- index
+         stats
+       },
+       FUN.VALUE=data.frame(Min. = numeric(0),
+         "1st Qu." = numeric(0),
+         Median = numeric(0),
+         Mean = numeric(0),
+         "3rd Qu." = numeric(0),
+         Max. = numeric(0),
+         Index = numeric(0)),
+       parallel = TRUE,
+       sample.size = sample.size,
+       mean = mean.val, std.dev = std.dev.val)
R> options("ore.warn.order" = FALSE)
R> head(res, 3)
   Min. X1st.Qu. Median   Mean X3rd.Qu.  Max. Index
1 67.56    93.11  99.42  99.30    105.8 128.0   847
2 67.73    94.19  99.86 100.10    106.3 130.7   258
3 65.58    93.15  99.78  99.82    106.2 134.3   264
R> tail(res, 3)
   Min. X1st.Qu. Median   Mean X3rd.Qu.  Max. Index
1 65.02    93.44  100.2 100.20    106.9 134.0     5
2 71.60    93.34   99.6  99.66    106.4 131.7     4
3 69.44    93.15  100.3 100.10    106.8 135.2     3
R> boxplot(ore.pull(res[, 1:6]),
+   main=sprintf("Boxplot of %d rnorm samples size %d, mean=%d, sd=%d",
+                num.simulations, sample.size, mean.val, std.dev.val))