In analyzing large data sets, a typical operation is to randomly partitioning the data set into subsets. You can analyze the partitions by using Oracle R Enterprise embedded R execution, as shown in Example 3-16. The example creates a data.frame object with the symbol myData in the local R session and adds a column to it that contains a randomly generated set of values. It pushes the data set to database memory as the object MYDATA. The example invokes the embedded R execution function ore.groupApply, which partitions the data based on the partition column and then applies the lm function to each partition.
See Also:
Example 3-16 Randomly Partitioning Data
N <- 200
k <- 5
myData <- data.frame(a=1:N,b=round(runif(N),2))
myData$partition <- sample(rep(1:k, each = N/k,
length.out = N), replace = TRUE)
MYDATA <- ore.push(myData)
head(MYDATA)
results <- ore.groupApply(MYDATA, MYDATA$partition,
function(y) {lm(b~a,y)}, parallel = TRUE)
length(results)
results[[1]]
Listing for Example 3-16
R> N <- 200
R> k <- 5
R> myData <- data.frame(a=1:N,b=round(runif(N),2))
R> myData$partition <- sample(rep(1:k, each = N/k,
+ length.out = N), replace = TRUE)
R> MYDATA <- ore.push(myData)
R> head(MYDATA)
a b partition
1 1 0.89 2
2 2 0.31 4
3 3 0.39 5
4 4 0.66 3
5 5 0.01 1
6 6 0.12 4
R> results <- ore.groupApply(MYDATA, MYDATA$partition,
+ function(y) {lm(b~a,y)}, parallel = TRUE)
R> length(results)
[1] 5
R> results[[1]]
Call:
lm(formula = b ~ a, data = y)
Coefficients:
(Intercept) a
0.388795 0.001015