rforeachparallel-processingcluster-computingdomc

Allow foreach workers to register and distribute sub-tasks to other workers


I have an R code that involves several foreach workers to perform some tasks in parallel. I am using foreach and doMC for this purpose. I want to let each of the foreach workers recruits some new workers and distribute some parts of their code, which is parallelizable, to them.

The current code looks like:

require(doMC)
require(foreach)
registerDoMC(cores = 8)

foreach (i = (1:8)) %dopar% {
<<some code here>>
    for (j in c(1:4))  {
    <<some other code here>>
    }
}

I am looking for an ideal code that would look like:

require(doMC)
require(foreach)
registerDoMC(cores = 8)

foreach (i = (1:8)) %dopar% {
<<some code here>>
    foreach (j = (1:4)) %dopar% {
    <<some other code here>>
    }
}

I saw an example of multi-paradigm parallelism using doSNOW and doMC here (https://www.rmetrics.org/files/Meielisalp2009/Presentations/Lewis.pdf#page=17). However, I do not know whether it does what I want or not.

Also, it seems Nested foreach is not applicable because it requires merging the two loops (see here), while in my case this is not preferred; the second loop only helps the first one for a portion of the code. Please correct me if I am wrong.

Thanks.


Solution

  • There's no particular problem with having a foreach loop inside of a foreach loop. Here's an example of a doMC loop inside a doSNOW loop:

    library(doSNOW)
    hosts <- c('host-1', 'host-2')
    cl <- makeSOCKcluster(hosts)
    registerDoSNOW(cl)
    r <- foreach(i=1:4, .packages='doMC') %dopar% {
      registerDoMC(2)
      foreach(j=1:8, .combine='c') %dopar% {
        i * j
      }
    }
    stopCluster(cl)
    

    It seems natural to me to use doMC for the inner loop, but you can do it anyway you want. You could also use doSNOW for both loops, but then you would need to create and stop the snow cluster inside the outer foreach loop.

    Here's an example of using doMC inside a doMC loop:

    library(doMC)
    registerDoMC(2)
    r <- foreach(i=1:2, .packages='doMC') %dopar% {
      ppid <- Sys.getpid()
      registerDoMC(2)
      foreach(j=1:2) %dopar% {
        c(ppid, Sys.getpid())
      }
    }
    

    The results demonstrate that a total of six processes are forked by the doMC package, although only four execute the body of the inner loop:

    > r
    [[1]]
    [[1]][[1]]
    [1] 14946 14949
    
    [[1]][[2]]
    [1] 14946 14951
    
    
    [[2]]
    [[2]][[1]]
    [1] 14947 14948
    
    [[2]][[2]]
    [1] 14947 14950
    

    Of course, you need to be careful not to start too many processes on a single node. I found this kind of nesting a bit awkward, which led to the development of the nesting operator.