rparallel-processingparallel-foreachmclapply

R error with mclapply in a foreach loop


Based on this post here, I tried to write a script, seen here:

library(parallel)
library(doParallel)

cl<-makeCluster(2,outfile='')
registerDoParallel(cl)

foreach(i=1:5, .packages='parallel') %dopar% {
    system.time(mclapply(1:10, function(x){rnorm(1e5)},mc.cores=2))
}

stopCluster(cl)

It worked intially but is now throwing up error codes:

Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(socklist[[n]]) : error reading from connection
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted

Any idea what's going on? Can one even put mclapply in a foreach loop?

Edit: I also want to say this is on a single 8-core machine, not a cluster.


Solution

  • I was able to reproduce your problem on my Linux machine using only the "parallel" package in R 3.2.3:

    library(parallel)
    cl <- makeCluster(2)
    clusterEvalQ(cl, library(parallel))
    fun <- function(i) {
      mclapply(1:10, function(x) rnorm(1e5), mc.cores=2)
      0
    }
    clusterApplyLB(cl, 1:5, fun)
    

    From my debugging sessions, it appears that the socket connections between the master and the workers can get corrupted, which can cause the workers to die when they get an error trying to "unserialize" data from the corrupted socket connection.

    Interestingly, I could get this example to work by using the "multicore" package instead of "parallel". I installed multicore 0.1-8 from RForge.net using the command:

    > install.packages('multicore',,'http://www.rforge.net/')  
    

    Then, I loaded "multicore" instead of "parallel" on the workers:

    clusterEvalQ(cl, library(multicore))
    

    Then the example worked fine. You could change your foreach loop to use the .packages='multicore' option.

    That's as far as I've tracked it down. My guess is that the child processes forked by "mclapply" in "parallel" are somehow corrupting the socket connection that they've inherited, but I haven't looked at the code to see if that theory is plausible.

    I guess your choices are:

    1. Don't use "mclapply" in a "doParallel" foreach loop
    2. Use "mclapply" from "multicore 0.1-8" instead of "parallel"
    3. Report this issue to R-Core

    You'll have to do additional work to report this to R-Core, but hopefully my example will help.