The ore.indexApply function executes the specified user-defined input function using data that is generated by the input function. It supports task-parallel execution, in which one or more R engines perform the same or different calculations, or task. The times argument to the ore.indexApply function specifies the number of times that the input function executes in the database. Any required data must be explicitly generated or loaded within the input function.
The syntax of the ore.indexApply function is the following:
ore.indexApply(times, FUN, ..., FUN.VALUE = NULL, FUN.NAME = NULL, FUN.OWNER = NULL,
parallel = getOption("ore.parallel", NULL))
The ore.indexApply function returns an ore.list object or an ore.frame object.
Examples of the use of the ore.indexApply function are in the following topics:
See Also:
"Arguments for Functions that Run Scripts" for descriptions of the arguments to function ore.indexApply
Example 6-17 invokes ore.indexApply and specifies that it execute the input function five times in parallel. It displays the class of the result, which is ore.list, and then displays the result.
Example 6-17 Using the ore.indexApply Function
res <- ore.indexApply(5,
function(index) {
paste("IndexApply:", index)
},
parallel = TRUE)
class(res)
res
Listing for Example 6-17
R> res <- ore.indexApply(5,
+ function(index) {
+ paste("IndexApply:", index)
+ },
+ parallel = TRUE)
R> class(res)
[1] "ore.list"
attr(,"package")
[1] "OREembed"
R> res
$`1`
[1] "IndexApply: 1"
$`2`
[1] "IndexApply: 2"
$`3`
[1] "IndexApply: 3"
$`4`
[1] "IndexApply: 4"
$`5`
[1] "IndexApply: 5"
Example 6-18 uses the R summary function to compute in parallel summary statistics on the first four numeric columns of the iris data set. The example combines the computations into a final result. The first argument to the ore.indexApply function is 4, which specifies the number of columns to summarize in parallel. The user-defined input function takes one argument, index, which will be a value between 1 and 4 and which specifies the column to summarize.
The example invokes the summary function on the specified column. The summary invocation returns a single row, which contains the summary statistics for the column. The example converts the result of the summary invocation into a data.frame and adds the column name to it.
The example next uses the FUN.VALUE argument to the ore.indexApply function to define the structure of the result of the function. The result is then returned as an ore.frame object with that structure.
Example 6-18 Using the ore.indexApply Function and Combining Results
res <- NULL
res <- ore.indexApply(4,
function(index) {
ss <- summary(iris[, index])
attr.names <- attr(ss, "names")
stats <- data.frame(matrix(ss, 1, length(ss)))
names(stats) <- attr.names
stats$col <- names(iris)[index]
stats
},
FUN.VALUE=data.frame(Min. = numeric(0),
"1st Qu." = numeric(0),
Median = numeric(0),
Mean = numeric(0),
"3rd Qu." = numeric(0),
Max. = numeric(0),
Col = character(0)),
parallel = TRUE)
res
Listing for Example 6-18
R> res <- NULL
R> res <- ore.indexApply(4,
+ function(index) {
+ ss <- summary(iris[, index])
+ attr.names <- attr(ss, "names")
+ stats <- data.frame(matrix(ss, 1, length(ss)))
+ names(stats) <- attr.names
+ stats$col <- names(iris)[index]
+ stats
+ },
+ FUN.VALUE=data.frame(Min. = numeric(0),
+ "1st Qu." = numeric(0),
+ Median = numeric(0),
+ Mean = numeric(0),
+ "3rd Qu." = numeric(0),
+ Max. = numeric(0),
+ Col = character(0)),
+ parallel = TRUE)
R> res
Min. X1st.Qu. Median Mean X3rd.Qu. Max. Col
1 2.0 2.8 3.00 3.057 3.3 4.4 Sepal.Width
2 4.3 5.1 5.80 5.843 6.4 7.9 Sepal.Length
3 0.1 0.3 1.30 1.199 1.8 2.5 Petal.Width
4 1.0 1.6 4.35 3.758 5.1 6.9 Petal.Length
Warning message:
ORE object has no unique key - using random order
You can use the ore.indexApply function in simulations, which can take advantage of high-performance computing hardware like an Oracle Exadata Database Machine. Example 6-19 takes multiple samples from a random normal distribution to compare the distribution of the summary statistics. Each simulation occurs in a separate R engine in the database, in parallel, up to the degree of parallelism allowed by the database.
Example 6-19 defines variables for the sample size, the mean and standard deviations of the random numbers, and the number of simulations to perform. The example specifies num.simulations as the first argument to the ore.indexApply function. The ore.indexApply function passes num.simulations to the user-defined function as the index argument. This input function then sets the random seed based on the index so that each invocation of the input function generates a different set of random numbers.
The input function next uses the rnorm function to produce sample.size random normal values. It invokes the summary function on the vector of random numbers, and then prepares a data.frame as the result it returns. The ore.indexApply function specifies the FUN.VALUE argument so that it returns an ore.frame that structures the combined results of the simulations. The res variable gets the ore.frame returned by the ore.indexApply function.
To get the distribution of samples, the example invokes the boxplot function on the data.frame that is the result of using the ore.pull function to bring selected columns from res to the client.
Example 6-19 Using the ore.indexApply Function in a Simulation
res <- NULL
sample.size = 1000
mean.val = 100
std.dev.val = 10
num.simulations = 1000
res <- ore.indexApply(num.simulations,
function(index, sample.size = 1000, mean = 0, std.dev = 1) {
set.seed(index)
x <- rnorm(sample.size, mean, std.dev)
ss <- summary(x)
attr.names <- attr(ss, "names")
stats <- data.frame(matrix(ss, 1, length(ss)))
names(stats) <- attr.names
stats$index <- index
stats
},
FUN.VALUE=data.frame(Min. = numeric(0),
"1st Qu." = numeric(0),
Median = numeric(0),
Mean = numeric(0),
"3rd Qu." = numeric(0),
Max. = numeric(0),
Index = numeric(0)),
parallel = TRUE,
sample.size = sample.size,
mean = mean.val, std.dev = std.dev.val)
options("ore.warn.order" = FALSE)
head(res, 3)
tail(res, 3)
boxplot(ore.pull(res[, 1:6]),
main=sprintf("Boxplot of %d rnorm samples size %d, mean=%d, sd=%d",
num.simulations, sample.size, mean.val, std.dev.val))
Listing for Example 6-19
R> res <- ore.indexApply(num.simulations,
+ function(index, sample.size = 1000, mean = 0, std.dev = 1) {
+ set.seed(index)
+ x <- rnorm(sample.size, mean, std.dev)
+ ss <- summary(x)
+ attr.names <- attr(ss, "names")
+ stats <- data.frame(matrix(ss, 1, length(ss)))
+ names(stats) <- attr.names
+ stats$index <- index
+ stats
+ },
+ FUN.VALUE=data.frame(Min. = numeric(0),
+ "1st Qu." = numeric(0),
+ Median = numeric(0),
+ Mean = numeric(0),
+ "3rd Qu." = numeric(0),
+ Max. = numeric(0),
+ Index = numeric(0)),
+ parallel = TRUE,
+ sample.size = sample.size,
+ mean = mean.val, std.dev = std.dev.val)
R> options("ore.warn.order" = FALSE)
R> head(res, 3)
Min. X1st.Qu. Median Mean X3rd.Qu. Max. Index
1 67.56 93.11 99.42 99.30 105.8 128.0 847
2 67.73 94.19 99.86 100.10 106.3 130.7 258
3 65.58 93.15 99.78 99.82 106.2 134.3 264
R> tail(res, 3)
Min. X1st.Qu. Median Mean X3rd.Qu. Max. Index
1 65.02 93.44 100.2 100.20 106.9 134.0 5
2 71.60 93.34 99.6 99.66 106.4 131.7 4
3 69.44 93.15 100.3 100.10 106.8 135.2 3
R> boxplot(ore.pull(res[, 1:6]),
+ main=sprintf("Boxplot of %d rnorm samples size %d, mean=%d, sd=%d",
+ num.simulations, sample.size, mean.val, std.dev.val))
Figure 6-2 Display of the boxplot Function in Example 6-19
