rr-futurefurrr

R furrr: authenticate an API on each future process before running the computation


I am running a parallel computation using furrr in R. The computation require access to a web API and an authentication needs to take place. If I run a parallel process, each process needs to authenticate. In the below, I have 6 processes. So I would need to authenticate on these six processes first then run the calculations. I don't know how to do that using furrr. So I end up doing an authentication in each run, which is really inefficient.

Below is a simple example for illustrative purposes. It does not work because I can't share the api.configure function, but hopefully you get the idea.

Thanks

library(tidyverse)
library(furrr)
plan(multiprocess, workers = 6)

testdf =  starwars %>%
  select(-films, -vehicles, -starships) %>%
  future_pmap_dfr(.f = function(...){
    api.configure(username = "username", password = "password")
    currentrow = tibble(...)
    l = tibble(name = currentrow$name, height = currentrow$height)
    return(l)
})

Solution

  • The way to solve this was to ask the dev of the API to add variable in the API package that tests whether the connection is open or not. this way I authenticate once on each of the future processes, if the connection is not open, and once this is done, all subsequent API authentication calls to that process will be halted by the if clause.