What is the difference between cluster and cores in registerDoParallel
when using doParallel package?
Is my understanding correct that on single machine these are interchangeable and I will get same results for :
cl <- makeCluster(4)
registerDoParallel(cl)
and
registerDoParallel(cores = 4)
The only difference I see that makeCluster()
has to be stopped explicitly using stopCluster()
.
The behavior of doParallel::registerDoParallel(<numeric>)
depends on the operating system, see print(doParallel::registerDoParallel)
for details.
On Windows machines,
doParallel::registerDoParallel(4)
effectively does
cl <- makeCluster(4)
doParallel::registerDoParallel(cl)
i.e. it set up four ("PSOCK") workers that run in background R sessions. Then, %dopar%
will basically utilize the parallel::parLapply()
machinery. With this setup, you do have to worry about global variables and packages being attached on each of the workers.
However, on non-Windows machines,
doParallel::registerDoParallel(4)
the result will be that %dopar%
will utilize the parallel::mclapply()
machinery, which in turn relies on forked processes. Since forking is used, you don't have to worry about globals and packages.