rforeachdomc

foreach - dopar do not start workers


I have the following piece of code that I would like to run with the doMC engine:

who_wins<-function(probs_a,probs_b,delta_order=0,delta_down=0){
  #browser()
  team_a<-runif(5,0,1)
  team_b<-runif(5,0,1)
  sya<-syb<-0
  for(i in 1:5){
    for(j in 1:2){
      if(j==1){
        if(sya<syb){
          team_a[i]<-(1-delta_down)*team_a[i]
        } 
        team_a[i]<-(1-(i-1)*delta_order)*team_a[i]
        sya<-sya+(team_a[i]<probs_a[i])
      }
      else{
        if(syb<sya){
          team_b[i]<-(1-delta_down)*team_b[i]
        } 
        team_b[i]<-(1-(i-1)*delta_order)*team_b[i]
        syb<-syb+(team_b[i]<probs_b[i])
      }
    }
  }
  if(sya>syb){
    return(1)
  }
  else if(sya<syb){
    return(2)
  }
  else {
    return(0)
  }
}

library(doMC)
registerDoMC(8)

probs_a<-seq(.6,.8,length.out=5)
probs_b<-probs_a[5:1]
nsim<-20000

results<-foreach(icount(nsim), .combine=c) %dopar% {
    return(who_wins(probs_a,probs_b))
}

The problem is that a couple of seconds after the first worker starts, the engine tries to launch the remaining. I see an spike in all processors, but they all die quickly, even the first one. Then, a new process is launched and the remaining of the code is run through this lone worker.

I have tried with different pieces of code and the engine works perfectly. But with this specific rutine, it doesn't.

Can anybody tell me what is happening? Thanks in advance.


Solution

  • Adding a Sys.sleep(0.01) inside your loop, I see all 8 processes “busy” with that one. After they are done, the main process remains busy for some time. I assume that the overhead of collecting the data from the individual processes and combining it into a single result is on a similar scale than the actual benefit from the parallelized computation. If you simply change the “computation” to return(1), you will see that this takes about as long as your computation, so the time is not spent on the workload but assembling the result.

    Neither .inorder=FALSE nor use of doParallel instead of doMC change this. However, I would consider this a problem in the foreach package, as mclapply has significantly less overhead:

    result <- unlist(mclapply(1:nsim, function(i) {
       return(who_wins(probs_a, probs_b))
    }, mc.cores=8))