6.2.7.2 Column-Parallel Use Case

Example 6-18 uses the R summary function to compute in parallel summary statistics on the first four numeric columns of the iris data set. The example combines the computations into a final result. The first argument to the ore.indexApply function is 4, which specifies the number of columns to summarize in parallel. The user-defined input function takes one argument, index, which will be a value between 1 and 4 and which specifies the column to summarize.

The example invokes the summary function on the specified column. The summary invocation returns a single row, which contains the summary statistics for the column. The example converts the result of the summary invocation into a data.frame and adds the column name to it.

The example next uses the FUN.VALUE argument to the ore.indexApply function to define the structure of the result of the function. The result is then returned as an ore.frame object with that structure.

Example 6-18 Using the ore.indexApply Function and Combining Results

res <- NULL
res <- ore.indexApply(4,
      function(index) {
        ss <- summary(iris[, index])
        attr.names <- attr(ss, "names")
        stats <- data.frame(matrix(ss, 1, length(ss)))
        names(stats) <- attr.names
        stats$col <- names(iris)[index]
        stats
      },
      FUN.VALUE=data.frame(Min. = numeric(0),
        "1st Qu." = numeric(0),
        Median = numeric(0),
        Mean = numeric(0),
        "3rd Qu." = numeric(0),
        Max. = numeric(0),
        Col = character(0)), 
      parallel = TRUE)
res
Listing for Example 6-18
R> res <- NULL
R> res <- ore.indexApply(4,
+       function(index) {
+         ss <- summary(iris[, index])
+         attr.names <- attr(ss, "names")
+         stats <- data.frame(matrix(ss, 1, length(ss)))
+         names(stats) <- attr.names
+         stats$col <- names(iris)[index]
+         stats
+       },
+       FUN.VALUE=data.frame(Min. = numeric(0),
+         "1st Qu." = numeric(0),
+         Median = numeric(0),
+         Mean = numeric(0),
+         "3rd Qu." = numeric(0),
+         Max. = numeric(0),
+         Col = character(0)),
+       parallel = TRUE)
R> res
  Min. X1st.Qu. Median  Mean X3rd.Qu. Max.          Col
1  2.0      2.8   3.00 3.057      3.3  4.4  Sepal.Width
2  4.3      5.1   5.80 5.843      6.4  7.9 Sepal.Length
3  0.1      0.3   1.30 1.199      1.8  2.5  Petal.Width
4  1.0      1.6   4.35 3.758      5.1  6.9 Petal.Length
Warning message:
ORE object has no unique key - using random order