Correct usage of drake::expose_imports() - Where to place call - Is it recursive?

Summary

I've noticed hints/suggestions/warnings in the drake docs suggesting use of expose_imports to ensure that changes in imported packages are tracked reproducibly, but the docs are relatively brief on the correct usage of this.

Example

I've now witnessed an example of the behaviour expose_imports is designed to correct in my own usage of drake, and I'd like to start using it.

In my case, the dependency that wasn't tracked was forcats, which, in version 0.4.0 had a bug in fct_collapse (Used by one of my functions) which would assign incorrect groups to the output factor.

0.4.0.9000 resolved this bug, and I updated to 0.4.0.9000, some time ago, but did notice that targets that must have run against the old version were not invalidated.

Question

I'm guessing that this is a problem that expose_imports might mitigate, but I don't really understand how / where to use it.

If I make scoped calls to my.package in my drake plans like so:

plan <- drake::drake_plan(
  mtc = mtcars,
  mtc_xformed = my.package::transfom_mtc(mtc)
)

And my.package::transform_mtc() has some dependency on another package, (Eg. forcats) then:

where should I be calling expose_imports?
- In the prework argument of make?
- In the top level of a file in my.package/R/ ?
Should I be calling
- expose_imports("my.package") ? or
- expose_imports("forcats")

Some clarification of this would be awesome

Solution

expose_imports() is mostly for packages you update/reinstall a lot. For example, say you write a package to implement a new statistical method, and the package is still under active development. Meanwhile, you are also writing a journal article about the method, and you have a reproducible drake pipeline to run simulation studies and compile the manuscript. Here, it is important to refresh the paper when you make changes to the package. In the project archetype here, your R/packages.R file would look something like this:

library(drake)
library(tidyverse)
library(yourCustomPackage)
expose_imports(yourCustomPackage)

Then, the plan can use functions from yourCustomPackage.

plan <- drake_plan(
   analysis = custom_method(...) # from yourCustomPackages
   # ...
)

Now, drake will invalidate targets in response to changes in custom_method(), along with any nested dependency functions of custom_method() in yourCustomPackages, and the dependencies of those dependencies in yourCustomPackages, etc. (Check vis_drake_graph() to see for yourself.)

expose_imports() is usually something I only recommend for packages directly related to the content of your research. It is not something I usually recommend for utilities like forcats. For those packages, I recommend renv to prevent unexpected changes from happening to begin with. In your case, I would update forcats, lock it down with renv, invalidate the targets you know depend on forcats, and trust that future changes to forcats are unlikely to be necessary.

Scoped calls like my.package::transfom_mtc(mtc) tell drake to track transform_mtc(), but not any unscoped dependency functions called from my.package::transfom_mtc(mtc). This is a one-foot-in-one-foot-out idea behavior that I no longer agree with. Next chance I get, I will make drake stop tracking these calls.