6.2.7 ore.indexApply関数の使用方法

ore.indexApply関数は、入力関数によって生成されたデータを使用して指定されたユーザー定義の入力関数を実行します。これは、1つ以上のRエンジンが同じまたは異なる計算(タスク)を実行するタスク・パラレル実行をサポートします。ore.indexApply関数に対するtimes引数には、データベース内で入力関数を実行する回数を指定します。必要なすべてのデータは、入力関数内で明示的に生成またはロードされる必要があります。

ore.indexApply関数の構文は次のとおりです。

ore.indexApply(times, FUN, ..., FUN.VALUE = NULL, FUN.NAME = NULL, FUN.OWNER = NULL,
               parallel = getOption("ore.parallel", NULL))

ore.indexApply関数は、ore.listオブジェクトまたはore.frameオブジェクトを返します。

ore.indexApply関数の使用例は、次の各項で説明します。

6.2.7.1 ore.indexApply関数の簡単な使用例

例6-17では、ore.indexApplyを呼び出し、入力関数を並行して5回実行することを指定します。結果のクラスであるore.listを表示した後に、結果を表示します。

例6-17 ore.indexApply関数の使用方法

res <- ore.indexApply(5,
      function(index) {
        paste("IndexApply:", index)
      },
      parallel = TRUE)
class(res)
res

例6-17のリスト

R> res <- ore.indexApply(5,
+       function(index) {
+         paste("IndexApply:", index)
+       },
+       parallel = TRUE)
R> class(res)
[1] "ore.list"
attr(,"package")
[1] "OREembed"
R> res
$`1`
[1] "IndexApply: 1"
 
$`2`
[1] "IndexApply: 2"
 
$`3`
[1] "IndexApply: 3"
 
$`4`
[1] "IndexApply: 4"
 
$`5`
[1] "IndexApply: 5"

6.2.7.2 列並行の使用例

例6-18では、Rのsummary関数を使用して、irisデータセットの最初の4つの数値列でサマリー統計を並行して計算します。この例では、計算を最終結果に結合します。ore.indexApply関数の最初の引数は4で、これは、並行してまとめる列の数を指定します。ユーザー定義の入力関数は1つの引数indexを取り、これは、まとめる列を指定する1から4の値です。

この例では、summary関数を指定した列で呼び出します。summaryの呼出しでは、列のサマリー統計が含まれている単一の行が返されます。この例では、summary呼出しの結果をdata.frameに変換し、そこに列名を追加します。

次に、ore.indexApply関数に対してFUN.VALUE引数を使用して、関数の結果の構造を定義します。結果はその後、その構造とともにore.frameオブジェクトとして返されます。

例6-18 ore.indexApply関数の使用方法および結果の結合

res <- NULL
res <- ore.indexApply(4,
      function(index) {
        ss <- summary(iris[, index])
        attr.names <- attr(ss, "names")
        stats <- data.frame(matrix(ss, 1, length(ss)))
        names(stats) <- attr.names
        stats$col <- names(iris)[index]
        stats
      },
      FUN.VALUE=data.frame(Min. = numeric(0),
        "1st Qu." = numeric(0),
        Median = numeric(0),
        Mean = numeric(0),
        "3rd Qu." = numeric(0),
        Max. = numeric(0),
        Col = character(0)), 
      parallel = TRUE)
res

例6-18のリスト

R> res <- NULL
R> res <- ore.indexApply(4,
+       function(index) {
+         ss <- summary(iris[, index])
+         attr.names <- attr(ss, "names")
+         stats <- data.frame(matrix(ss, 1, length(ss)))
+         names(stats) <- attr.names
+         stats$col <- names(iris)[index]
+         stats
+       },
+       FUN.VALUE=data.frame(Min. = numeric(0),
+         "1st Qu." = numeric(0),
+         Median = numeric(0),
+         Mean = numeric(0),
+         "3rd Qu." = numeric(0),
+         Max. = numeric(0),
+         Col = character(0)),
+       parallel = TRUE)
R> res
  Min. X1st.Qu. Median  Mean X3rd.Qu. Max.          Col
1  2.0      2.8   3.00 3.057      3.3  4.4  Sepal.Width
2  4.3      5.1   5.80 5.843      6.4  7.9 Sepal.Length
3  0.1      0.3   1.30 1.199      1.8  2.5  Petal.Width
4  1.0      1.6   4.35 3.758      5.1  6.9 Petal.Length
Warning message:
ORE object has no unique key - using random order

6.2.7.3 シミュレーションの使用例

ore.indexApply関数をシミュレーションで使用しすることで、Oracle Exadataデータベース・マシンなどの高パフォーマンスのコンピューティング・ハードウェアを利用できます。例6-19は、ランダムな正規分布の複数のサンプルを使用してサマリー統計の分布を比較します。各シミュレーションは、データベースの別個のRエンジンで、データベースで許可された並列度まで並列に実行されます。

例6-19では、サンプル・サイズの変数、乱数値の平均および標準偏差および実行するシミュレーションの数を定義します。この例では、num.simulationsをore.indexApply関数の最初の引数として指定します。ore.indexApply関数は、num.simulationsをindex引数としてユーザー定義の関数に渡します。この入力関数はその後、各入力関数の呼出しで異なる乱数値のセットが生成されるように、索引に基づいて乱数シードを設定します。

次に、入力関数は、rnorm関数を使用してsample.sizeランダムな標準値を生成します。乱数のベクターでsummary関数を呼び出し、返される結果としてdata.frameを準備します。ore.indexApply関数には、シミュレーションの結合された結果を構成するore.frameを返すように、FUN.VALUE引数を指定します。res変数は、ore.indexApply関数によって返されるore.frameを取得します。

サンプルの分布を取得するために、この例では、ore.pull関数を使用した結果であるdata.frameでboxplot関数を呼び出し、resから選択した列をクライアントに渡します。

例6-19 シミュレーションでのore.indexApply関数の使用方法

res <- NULL
sample.size = 1000
mean.val = 100
std.dev.val = 10
num.simulations = 1000
 
res <- ore.indexApply(num.simulations,
      function(index, sample.size = 1000, mean = 0, std.dev = 1) {
        set.seed(index)
        x <- rnorm(sample.size, mean, std.dev)
        ss <- summary(x)
        attr.names <- attr(ss, "names")
        stats <- data.frame(matrix(ss, 1, length(ss)))
        names(stats) <- attr.names
        stats$index <- index
        stats
      },
      FUN.VALUE=data.frame(Min. = numeric(0),
        "1st Qu." = numeric(0),
        Median = numeric(0),
        Mean = numeric(0),
        "3rd Qu." = numeric(0),
        Max. = numeric(0),
        Index = numeric(0)),
      parallel = TRUE,
      sample.size = sample.size,
      mean = mean.val, std.dev = std.dev.val)
options("ore.warn.order" = FALSE)
head(res, 3)
tail(res, 3)
boxplot(ore.pull(res[, 1:6]),
  main=sprintf("Boxplot of %d rnorm samples size %d, mean=%d, sd=%d",
               num.simulations, sample.size, mean.val, std.dev.val))

例6-19のリスト

R> res <- ore.indexApply(num.simulations,
+       function(index, sample.size = 1000, mean = 0, std.dev = 1) {
+         set.seed(index)
+         x <- rnorm(sample.size, mean, std.dev)
+         ss <- summary(x)
+         attr.names <- attr(ss, "names")
+         stats <- data.frame(matrix(ss, 1, length(ss)))
+         names(stats) <- attr.names
+         stats$index <- index
+         stats
+       },
+       FUN.VALUE=data.frame(Min. = numeric(0),
+         "1st Qu." = numeric(0),
+         Median = numeric(0),
+         Mean = numeric(0),
+         "3rd Qu." = numeric(0),
+         Max. = numeric(0),
+         Index = numeric(0)),
+       parallel = TRUE,
+       sample.size = sample.size,
+       mean = mean.val, std.dev = std.dev.val)
R> options("ore.warn.order" = FALSE)
R> head(res, 3)
   Min. X1st.Qu. Median   Mean X3rd.Qu.  Max. Index
1 67.56    93.11  99.42  99.30    105.8 128.0   847
2 67.73    94.19  99.86 100.10    106.3 130.7   258
3 65.58    93.15  99.78  99.82    106.2 134.3   264
R> tail(res, 3)
   Min. X1st.Qu. Median   Mean X3rd.Qu.  Max. Index
1 65.02    93.44  100.2 100.20    106.9 134.0     5
2 71.60    93.34   99.6  99.66    106.4 131.7     4
3 69.44    93.15  100.3 100.10    106.8 135.2     3
R> boxplot(ore.pull(res[, 1:6]),
+   main=sprintf("Boxplot of %d rnorm samples size %d, mean=%d, sd=%d",
+                num.simulations, sample.size, mean.val, std.dev.val))

図6-2 例6-19でのboxplot関数の表示

「図6-2 例6-19でのboxplot関数の表示」の説明