Here is a minimal example showing the issue:
mod.r
:#' @export
run_sqrt <- function (x) {
sqrt(x)
}
mwe.r
box::use(
./mod[...],
parallel,
dp = doParallel,
foreach[foreach, `%dopar%`],
)
cl <- parallel$makeCluster(2L)
dp$registerDoParallel(cl)
foreach(i = 1 : 5) %dopar% {
run_sqrt(i)
}
parallel$stopCluster(cl)
This raises the error
Error in { : task 1 failed - "could not find function "run_sqrt""
I found this
parallel::clusterExport(cluster, setdiff(ls(), "cluster"))
in How to use `foreach` and `%dopar%` with an `R6` class in R?
But it didn't work
As you found this is a limitation of the ‘parallel’ package. It only knows about names defined in the current environment.
There are several solutions for this. The following list is roughly in order of (my personal) preference, from most preferred to least preferred.
Use explicitly qualified module access instead of attaching. So:
Change ./mod[...]
to ./mod
inside box::use()
Fully qualify the name inside foreach
:
foreach(i = 1 : 5) %dopar% {
mod$run_sqrt(i)
}
Due to how parallel
searches names, this will only work if the above code is executed in the global environment.
Import ./mod
inside the foreach
body instead of at the beginning of your script. However, note that there is currently an open bug regarding this solution.
Use parallel::clusterExport
; this solution works if the correct names are provided, in this case run_sqrt
. To make the minimal example work, add the following line before the foreach
call:
parallel$clusterExport(cl, "run_sqrt", envir = environment())
The reason why your version didn’t work is because ls()
won’t list run_sqrt
, since the name is attached, it does not exist in the local scope. The same issue would exist with attached packages instead of modules. Furthermore, for reasons I do not understand, clusterExport
by default searches names in the global environment only, you need to explicitly provide the current environment, via envir = environment()
.