r parallel-processing zeromq parallel-foreach

Using clustermq R package as a parallel backend for foreach

I've started using the clustermq package as a parallel backend for a drake pipeline, and have been very impressed over the performance improvements that I've observed. I'm interested in evaluating the use of clustermq / rzmq in settings outside of drake, but seemingly can't get the example using foreach listed in the User Guide (in the subsection titled "As parallel foreach backend") to work. What am I missing here?

In the example below on my 4-core machine, I would expect the following code to run in close to 5 seconds, yet it runs in close to 20 seconds. When I use similar code to run some heavy processing, I'm only observing one core doing significant work.

library(foreach)
(n_cores <- parallel::detectCores())
#> [1] 4
clustermq::register_dopar_cmq(n_jobs = n_cores)
system.time(foreach(i = seq_len(n_cores)) %dopar% Sys.sleep(5))
#> Submitting 4 worker jobs (ID: 6856) ...
#>    user  system elapsed 
#>   0.118   0.022  20.187

Solution

The author of the clustermq package kindly responded to a GitHub issue that I posted regarding this. In short, there is an option clustermq.scheduler that can be set to specify the type of scheduling that clustermq employs. For my case, since the option was unset, clustermq was defaulting to local (i.e. sequential) scheduling. To perform parallel processing on your local machine, you can set the clustermq.scheduler option to "multicore". So in total, this results in the following.

library(foreach)
(n_cores <- parallel::detectCores())
#> [1] 4
clustermq::register_dopar_cmq(n_jobs = n_cores)
options(clustermq.scheduler = "multicore")
system.time(foreach(i = seq_len(n_cores)) %dopar% Sys.sleep(5))
#> Submitting 4 worker jobs (ID: 7206) ...
#> Running 4 calculations (1 objs/0 Mb common; 1 calls/chunk) ...
#> Master: [5.6s 3.6% CPU]; Worker: [avg 1.3% CPU, max 2475916.0 Mb]
#>    user  system elapsed 
#>   0.188   0.065   5.693