I am trying to use future
parallelization via furrr
or future.apply
in combination with the rsample
package to bootstrap the estimates of a model. Sequential estimation works. Parallel estimation with the parallel
package works. Parallel estimation with future
does not work.
Any ideas why?
library(rsample)
library(broom)
library(purrr)
library(furrr)
library(future.apply)
library(parallel)
plan(multisession)
splits <- rsample::bootstraps(mtcars, times = 1000)
fit <- \(d) tidy(lm(mpg ~ wt, data = d))
splits$estimates <- map(splits$splits, fit)
splits$estimates <- lapply(splits$splits, fit)
## Summarize results if needed
# int_pctl(splits, estimates)
parallel
package workssplits$estimates <- mclapply(splits$splits, fit, mc.cores = 4)
future
does not work# does not work
splits$estimates <- future_map(splits$splits, ~ tidy(lm(mpg ~ wt, data = .x)))
#> Error:
#> ℹ In index: 1.
#> Caused by error in `as.data.frame.default()`:
#> ! cannot coerce class ‘c("boot_split", "rsplit")’ to a data.frame
# does not work
splits$estimates <- future_lapply(splits$splits, \(d) tidy(lm(mpg ~ wt, data = d)))
#> Error in as.data.frame.default(data): cannot coerce class ‘c("boot_split", "rsplit")’ to a data.frame
Ok, so it is a bit tricky.
The error is caused by future_map
not seeing the package rsample
.
It works by specifying explicit dependencies:
splits$estimates <- future_map(
splits$splits
, ~ tidy(lm(mpg ~ wt, data = .x))
,.options = furrr_options(packages = "rsample")
)
It fails to detect this dependency because it is hidden as class method.
As an alternative you could use the analysis
function from rsample
package:
splits$estimates <- future_map(splits$splits, ~ tidy(lm(mpg ~ wt, data = analysis(.x))))