rrparalleldoparallel

doParallel, cluster vs cores


What is the difference between cluster and cores in registerDoParallel when using doParallel package?

Is my understanding correct that on single machine these are interchangeable and I will get same results for :

cl <- makeCluster(4)
registerDoParallel(cl)    

and

registerDoParallel(cores = 4)

The only difference I see that makeCluster() has to be stopped explicitly using stopCluster().


Solution

  • The behavior of doParallel::registerDoParallel(<numeric>) depends on the operating system, see print(doParallel::registerDoParallel) for details.

    On Windows machines,

    doParallel::registerDoParallel(4)
    

    effectively does

    cl <- makeCluster(4)
    doParallel::registerDoParallel(cl)
    

    i.e. it set up four ("PSOCK") workers that run in background R sessions. Then, %dopar% will basically utilize the parallel::parLapply() machinery. With this setup, you do have to worry about global variables and packages being attached on each of the workers.

    However, on non-Windows machines,

    doParallel::registerDoParallel(4)
    

    the result will be that %dopar% will utilize the parallel::mclapply() machinery, which in turn relies on forked processes. Since forking is used, you don't have to worry about globals and packages.