rcrater-futurecarrierfurrr

Include a '2nd-level' user function in carrier::crate that can be used in furrr::future_map


I followed this article to create a crate function which calls a user function which itself calls another user function. I supplied both user functions to crate. This seems to work fine as the user functions show up in the printout of the crate function. But when I use the crate function in future_map the inner user function cannot be found.

library(furrr)

future::plan(multisession, workers = 3)

inner_foo <- function(x){
  x^2
}

outer_foo <- function(x){
  inner_foo(x)
}

outer_crate <- carrier::crate(
  function(x) outer_foo(x),
  outer_foo = outer_foo,
  inner_foo = inner_foo
)

# this works
outer_crate(3)

# shows that inner_foo is packaged with outer_crate
outer_crate

# does not work as inner_foo not found
future_map(1:3, outer_crate,
           .options = furrr_options(globals = FALSE))

# works if inner_foo is manually supplied
future_map(1:3, outer_crate,
           .options = furrr_options(globals = "inner_foo"))

This problem only occurs when there are workers set with plan, otherwise it works as expected.


Solution

  • This is a known issue. It happens because the inner_foo() function lives in R's global environment. The global environment is special, because its content is not carried along when exporting objects to parallel workers.

    The solution is to have inner_foo() live in the environment of the function that uses it, i.e. the environment of outer_foo().

    There are two ways to achieve this. The first approach is:

    outer_foo <- local({
      inner_foo <- function(x){
        x^2
      }
      
      function(x){
        inner_foo(x)
      }
    })
    

    The second approach is to "fix it up" after creating the functions:

    inner_foo <- function(x){
      x^2
    }
    
    outer_foo <- function(x){
      inner_foo(x)
    }
    
    environment(outer_foo) <- new.env(parent = environment(outer_foo))
    environment(outer_foo)$inner_foo <- inner_foo
    

    Ideally, the Futureverse would do this automagically, but it's a complex problem with its own issues, so that's still on the roadmap.