I have the following code, that performs bootstrapping and calculates the confidence interval.
library(resample)
ibrary(broom)
library(dplyr)
library(purrr)
library(tibble)
lm_est <- function(split, ...) {
lm(mpg ~ disp + hp, data = analysis(split)) %>%
tidy()
}
set.seed(52156)
car_rs <-
bootstraps(mtcars, 500, apparent = TRUE) %>%
mutate(results = map(splits, lm_est))
int_pctl(car_rs, results) # this is important
It produces
> int_pctl(car_rs, results)
# A tibble: 3 × 6
term .lower .estimate .upper .alpha .method
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 (Intercept) 27.7 31.0 34.1 0.05 percentile
2 disp -0.0431 -0.0295 -0.0123 0.05 percentile
3 hp -0.0643 -0.0281 -0.00930 0.05 percentile
But it runs very slowly. How can I speed it up with parallelization? Note that the output of the parallelization needs to be able to be processed by int_pctl().
I tried this but failed:
library(parallel)
# set the number of cores to use for parallelization
cores <- detectCores() - 1
cl <- makeCluster(cores)
# use mcmapply to parallelize the bootstrapping process
car_rs$results <- mcmapply(lm_est, car_rs$splits, mc.cores = cores, mc.preschedule = TRUE)
stopCluster(cl)
There are parallel versions of purrr::map*()
functions in the furrr package that you can use.
library(rsample)
library(broom)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
library(tibble)
library(furrr) #<- added
#> Loading required package: future
plan(multisession, workers = parallel::detectCores()) #<- added
lm_est <- function(split, ...) {
library(broom) #<- added to load inside of remote workers
lm(mpg ~ disp + hp, data = analysis(split)) %>%
tidy()
}
set.seed(52156)
car_rs <-
bootstraps(mtcars, 1500, apparent = TRUE) %>%
mutate(results = future_map(splits, lm_est)) #<- changed
int_pctl(car_rs, results) # this is important
#> # A tibble: 3 × 6
#> term .lower .estimate .upper .alpha .method
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 (Intercept) 27.7 30.8 33.6 0.05 percentile
#> 2 disp -0.0443 -0.0298 -0.0146 0.05 percentile
#> 3 hp -0.0584 -0.0267 -0.00718 0.05 percentile
Created on 2023-01-26 by the reprex package (v2.0.1)