rfuturepurrrtidyevalfurrr

Using tidy evaluations with furrr


I want to make the following function run in parallel using the furrr package instead of the purrr package.

library(furrr)
library(tidyverse)

input <- list(element1 = tibble::tibble(a = c(1, 2), b = c(2, 2)),
              element2 = tibble::tibble(a = c(1, 2), b = c(4, 4))
)

multiplier <- function(data, var1, var2){
  purrr::map_df(.x = data,
                .f = ~ .x %>% 
                  dplyr::mutate(product = {{var1}} * {{var2}})
  )
}

multiplier(input, a, b)

However, when I just convert it to the furrr equivalent I get an error.

multiplier_parallel <- function(data, var1, var2){
  furrr::future_map_dfr(.x = data,
                .f = ~ .x %>% 
                  dplyr::mutate(product = {{var1}} * {{var2}})
  )
}

future::plan(multiprocess)

multiplier_parallel(input, a, b)
Error in get(name, envir = env, inherits = FALSE) : 
Identified global objects via static code inspection (structure(function (..., .x = ..1, .y = ..2, . = 
..1); .x %>% dplyr::mutate(product = {; {; var1; }; } * {; {; var2; }; }), class = 
c("rlang_lambda_function", "function"))). Object 'a' not found 

I assume the reason is that the future package looks for all necessary variables to be exported to the workers. In this case it is looking for the column name "a" as a global variable but cannot find it hence the error.

When I just insert the variable names into the call it works, however now the function does not work with any variable names anymore:

multiplier_parallel <- function(data, var1, var2){
  furrr::future_map_dfr(.x = data,
                .f = ~ .x %>% 
                  dplyr::mutate(product = a * b)
  )
}

multiplier_parallel(input, a, b)

I tried several things so far including providing the names to .future_options, but none seem to work. Is there any way to make this work? My actual function is quite a bit more complex but I assume the principal is the same. Would be great if someone could help!


Solution

  • future tries to automatically determine the global variables you use in your code. Because of the tidy evaluation, it identifies a and b but doesn't find it. You can disable this setting by using future_options(globals = FALSE).

    future::plan(future::multiprocess)
    
    input <- list(element1 = tibble::tibble(a = c(1, 2), b = c(2, 2)),
                  element2 = tibble::tibble(a = c(1, 2), b = c(4, 4))
    )
    
    multiplier_parallel <- function(data, var1, var2){
          furrr::future_map_dfr(.x = data,
                                .f = ~ .x %>% 
                                      dplyr::mutate(product = {{var1}} * {{var2}}),
                                .options = furrr::future_options(globals = FALSE)
          )
    }
    
    multiplier_parallel(input, a, b)
    # A tibble: 4 x 3
          a     b product
      <dbl> <dbl>   <dbl>
    1     1     2       2
    2     2     2       4
    3     1     4       4
    4     2     4       8