renvironment-variablesfurrr

What is furrr's "black magic?"


I use the R package furrr for most of my parallelization needs, and basically never have issues with exporting things from my global environment to the cluster. Today I did and I have no idea why. The package documentation seems to describe the process by which global variables are sent to the clusters as "black magic." What is the black magic?

The furrr::future_options documentation says:

Global variables and packages By default, the future package will perform black magic to look up the global variables and packages that your furrr call requires, and it will export these to each worker. However, it is not always perfect, and can be refined with the globals and packages arguments.

As a secondary question: is there an elegant way to tell it to do its black magic, but also to export something it missed? Or, are the choices a) all black magic, or b) hard code everything in the .options argument?


Solution

  • This doesn't totally answer the question, but I think it points you in the right direction. From the "Globals" section of this intro vignette:

    It does this with help of the globals package, which uses static-code inspection to identify global variables. If a global variable is identified, it is captured and made available to the evaluating process.

    There's also this "Common Issues with Solutions" vignette (which @michael helpfully linked above) that discusses some common "gotcha" things that result from the static code eval in the globals package.

    I found my way here because my future_map() code was failing to find the variables I referenced inside a glue() call. That vignette explained exactly why this happens.

    As to why your code was sometimes working and sometimes not, hard to say. But as you can see, there is sufficient complexity going on under the hood that I'm not surprised if some seemingly unrelated change broke something. (For me this change was cleaning up my code and using glue instead of paste :shrug:)