rmultidimensional-arrayforeachdoparallel

Getting foreach to store its return value in a 3d array


I have a function which returns a (numVals x N) array/matrix. This function needs to be evaluated K times. My goal is to store all results in a multidimensional array containing doubles with shape c(numVals, N, K).

I'm having trouble finding appropriate arguments for .combine (or other parameters of foreach) so that its return value is in the correct format. I realize I could just go ahead and reshape a 2d returned by foreach later, but I am kind of running into memory limitations (and I'm not sure I can reshape in-place without any memory overhead.

The solution I'm looking for can be either foreach (or similar function compatible with dopar) outputting a 3d or reshaping into correct format without having to create another object with memory footprint as large as results.

Here's a code snippet:

library(doParallel)
library(doRNG)
registerDoParallel(cores = 3)
registerDoRNG(12345)

run_tasks <- function(k, N, numVals) {
  return(matrix(runif(numVals * N), numVals, N))
}

K <- 10000
N <- 40
numVals <- 10

# Run the simulation
results <-
  foreach(k = 1:K, .combine = rbind) %dorng% run_tasks(k, N, numVals)

# Desired output format
# results <- array(NA, c(numVals, N, K))

Solution

  • By default, foreach returns the results in a list, so don't use .combine at all and make an array afterwards.

    > results <- foreach(k = 1:K) %dorng% run_tasks(k, N, numVals)
    > A <- array(unlist(results), dim=c(numVals, N, K))
    > dim(A)
    [1]  10  40 10000
    > all.equal(A[,,1], results[[1]])
    [1] TRUE
    

    Not a frequent foreach user, but I think .combine is just a convenience argument, and instead of .combine=rbind you could also use do.call('rbind', result). The combining is done after the multithreaded process, so using .combine should not have significant speed gains.