rtidymodelsr-futurefurrrrsample

`future` and `rsample` for parallel bootstrapping


I am trying to use future parallelization via furrr or future.apply in combination with the rsample package to bootstrap the estimates of a model. Sequential estimation works. Parallel estimation with the parallel package works. Parallel estimation with future does not work.

Any ideas why?

library(rsample)
library(broom)
library(purrr)
library(furrr)
library(future.apply)
library(parallel)
plan(multisession)

splits <- rsample::bootstraps(mtcars, times = 1000)

Sequential works

fit <- \(d) tidy(lm(mpg ~ wt, data = d))
splits$estimates <- map(splits$splits, fit)
splits$estimates <- lapply(splits$splits, fit)

## Summarize results if needed
# int_pctl(splits, estimates)

Parallel with parallel package works

splits$estimates <- mclapply(splits$splits, fit, mc.cores = 4)

Parallel with future does not work

# does not work
splits$estimates <- future_map(splits$splits, ~ tidy(lm(mpg ~ wt, data = .x)))
#> Error:
#> ℹ In index: 1.
#> Caused by error in `as.data.frame.default()`:
#> ! cannot coerce class ‘c("boot_split", "rsplit")’ to a data.frame
# does not work
splits$estimates <- future_lapply(splits$splits, \(d) tidy(lm(mpg ~ wt, data = d)))
#> Error in as.data.frame.default(data): cannot coerce class ‘c("boot_split", "rsplit")’ to a data.frame

Solution

  • Ok, so it is a bit tricky.

    The error is caused by future_map not seeing the package rsample.

    It works by specifying explicit dependencies:

    splits$estimates <- future_map(
        splits$splits
        , ~ tidy(lm(mpg ~ wt, data = .x))
        ,.options = furrr_options(packages = "rsample")
    )
    

    It fails to detect this dependency because it is hidden as class method.

    As an alternative you could use the analysis function from rsample package:

    splits$estimates <- future_map(splits$splits, ~ tidy(lm(mpg ~ wt, data = analysis(.x))))