rforeachparallel-processingdomc

Need help on combine function in a Parallel Simulation study using doMC


I want to ask for some help on writing a combine function for foreach(). Consider the function below:

library(mvtnorm)
library(doMC)

mySimFunc <- function(){
  myNum <- runif(1)
  myVec <- rnorm(10)
  myMat <- rmvnorm(5, rep(0, 3), diag(3))
  myListRslt <- list("myNum" = myNum, "myVec" = myVec, "myMat" = myMat)
return (myListRslt)
}

Now I'd like to run the code above for 1000 times using foreach() %dopar% and in each iteration I'd like to:

  1. return myNum as is
  2. get average of myVec and return it
  3. get colMeans() of myMat and return it.

I'd like foreach() %dopar% to return a final list including:

  1. a vector of length 1000 including 1000 myNum each corresponding to an iteration
  2. a vector of length 1000 including 1000 average of myVec in each iteration
  3. a matrix with 1000 rows where each row includes colMeans of myMat in that iteration

My Ideal solution

My ideal solution is o find a way that foreach() acts exactly like for so that I can simply define:

myNumRslt <- NULL
myVecRslt <- NULL
myMatRslt <- NULL

# and then simply aggregate result of each iteration to the variables above as:
foreach(i = 1:1000) %dopar%{
   rslt <- mySimFunc()
   myNumRslt <- c(myNumRslt, rslt$myNum)
   myVecRslt <- c(myVecRslt, mean(rslt$myVec))
   myMatRslt.tmp <- colMeans(rslt$myMat)
   myMatRslt <- rbind(myMatRslt, myMatRslt.tmp)
}

BUT, unfortunately seems that it's not possible to do that with foreach() so then I think the only solution is to write a combine function that does similar to result aggregation above.

Challenge

1) How could I write a combine function that returns what I explained above?

2) When we do %dopar% (suppose using doMC package), does doMC distribute each iteration to a CPU or it goes further and divide each iteration to further pieces and distribute them?

3) Is there any better (more efficient) way than using doMC and foreach() ? idea's In this question Brian mentioned a brilliant way to deal with lists including numeric values. In my case, I have numeric values as well as vectors and matrices. I don't know how to extend Brian's idea in my case.

Thanks very much for your help.


Solution

  • Edit

    Cleaned up, generalizable solution using .combine:

    #modify function to include aggregation
    mySimFunc2 <- function(){
    myNum <- runif(1)
    myVec <- mean(rnorm(10))
    myMat <- colMeans(rmvnorm(5, rep(0, 3), diag(3)))
    myListRslt <- list("myNum" = myNum, "myVec" = myVec, "myMat" = myMat)
    return (myListRslt)
    }
    
    #.combine function
    MyComb1 <- function(...) {
    lst=list(...)
    vec<-sapply(1:length(lst), function (i) return(lst[[i]][[1]] ))
    vecavg<-sapply(1:length(lst),function (i) return(lst[[i]][[2]] ))
    colmeans<-t(sapply(1:length(lst), function (i) return(lst[[i]][[3]])))
    final<-list(vec,vecavg,colmeans)
    names(final)<-c("vec","vecavg","colmeans")
    return(final)
    }
    
    library(doParallel)
    cl <- makeCluster(3) #set cores
    registerDoParallel(cl)
    
    foreach(i=1:1000,.export=c("mySimFunc2","MyComb1"),.combine=MyComb1,
    .multicombine=TRUE,.maxcombine=1000, .packages=c("mvtnorm"))%dopar%{mySimFunc2()}
    

    You should now have a list output containing the desired three objects, which I've titled respectively as vec, vecavg, and colmeans. Note you must set .maxcombine to the number of iterations if iterations are greater than 100.

    As a side note, it does not make sense to parallelize for this example task, although I'm guessing the real task may be more complex.