rsocketsparallel-processingnon-linear-regressionr-4.0.0

Negative binomial regression in R using brm causing error when using multiple cores


I am calculating a negative binomial regression using the brm function from the brms package. As this takes quite some time, I would like to use multiple cores as suggested in the documentation.

bfit_s <- brm(
  dep_var ~ ind_var +
    var1 +
    var2 +
    (1 | some_level1) + (1 | some_level2),
  data = my_df,
  family = negbinomial(link = "log", link_shape = "log"),
  cores = 4,
  control = list(adapt_delta = 0.999)
)

However, I am running into an error saying that the connection of all four workers failed:

Compiling the C++ model

Start sampling
starting worker pid=11603 on localhost:11447 at 14:13:56.193
starting worker pid=11601 on localhost:11447 at 14:13:56.193
starting worker pid=11602 on localhost:11447 at 14:13:56.198
starting worker pid=11604 on localhost:11447 at 14:13:56.201
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> -> slaveLoop -> makeSOCKmaster
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> -> slaveLoop -> makeSOCKmaster
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> -> slaveLoop -> makeSOCKmaster
Execution halted
Execution halted
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> -> slaveLoop -> makeSOCKmaster
Execution halted

The traceback says Error in makePSOCKcluster(names = spec, ...) : Cluster setup failed. 4 of 4 workers failed to connect.

I tried to understand the problem, read some questions on SO like this, but couldn't figure out why I can't connect. I'm using macOS Mojave and the problem is not that I try to use more cores than possible. Any suggestions on how I could get this to run on multiple cores?


Edit: As sjp pointed out in his answer, there is an issue with RStudio. I thought I share the code to solve the problem right here in my question, so everyone stumbling across can solve this without clicking (and reading) any further.

The problem is the parallel package from R-4.0.0. - but a workaround is provided by a user from this stan forum. If you can initialize clusters with setup_strategy="sequential" like this:

cl <- parallel::makeCluster(2, setup_strategy = "sequential") 

You can add a short snippet to your ~/.Rprofile to make this kind of a default setting:

## WORKAROUND: https://github.com/rstudio/rstudio/issues/6692
## Revert to 'sequential' setup of PSOCK cluster in RStudio Console on macOS and R 4.0.0
if (Sys.getenv("RSTUDIO") == "1" && !nzchar(Sys.getenv("RSTUDIO_TERM")) && 
    Sys.info()["sysname"] == "Darwin" && getRversion() == "4.0.0") {
  parallel:::setDefaultClusterOptions(setup_strategy = "sequential")
}

Solution

  • This is a known issue that has to do with RStudio. Check out these related posts on the Stan forums and Github.

    Github: https://github.com/rstudio/rstudio/issues/6692

    Stan forums: https://discourse.mc-stan.org/t/r-4-0-0-and-cran-macos-binaries/13989/13