rropenscidrake-r-package

R drake file out name with variable


I am using drake to create multiple output files, where I want to specify the path by a variable. Something like

outpath <- "data"
outfile <- file.path(outpath, "mydata.csv")
write.csv(df, outfile)

But file_out doesn't seem to work with arguments given to it other than literal characters.

To give a small code example:

Code setup

library(drake)

outpath <- "data"
# for reproducibility only
if (!dir.exists(outpath)) dir.create(outpath)

make_data <- function() data.frame(x = 1:10, y = rnorm(10))

Working Code

directly specifying the file:

p0 <- drake_plan(
  df = make_data(),
  write.csv(df, file_out("data/mydata0.csv"))
)
make(p0)
#> target file "data/mydata0.csv"

Failing Code

using file.path to construct the outfile

p1 <- drake_plan(
  df = make_data(),
  write.csv(df, file_out(file.path(outpath, "mydata1.csv")))
)
make(p1)
#> target file "mydata1.csv"
#> Error: The file does not exist: mydata1.csv
#> In addition: Warning message:
#> File "mydata1.csv" was built or processed,
#> but the file itself does not exist. 

I guess drake finds only the literal string as a target and not the result of file.path(...), for example, this fails as well

p2 <- drake_plan(
  df = make_data(),
  outfile = file.path(outpath, "mydata1.csv"),
  write.csv(df, file_out(outfile))
)
#> Error: found an empty file_out() in command: write.csv(df, file_out(outfile))

Any idea how to fix that?


Solution

  • Sorry I am so late to this thread. I can more easily find questions with the drake-r-package tag.

    Thanks to @Alexis for providing the link to the relevant thread. Wildcards can really help here.

    All your targets, input files, and output files need to be explicitly named in advance. This is so drake can figure out all the dependency relationships without evaluating any code in your plan. Since drake is responsible for figuring out which targets to build when, I am probably not going to relax this requirement in future development.

    For what it's worth, tidy evaluation may also help.

    library(drake) # version 5.3.0
    pkgconfig::set_config("drake::strings_in_dots" = "literals")
    file <- file.path("dir", "mydata1.csv")
    drake_plan(
      df = make_data(),
      output = write.csv(df, file_out(!!file))
    )
    #> # A tibble: 2 x 2
    #>   target         command                                       
    #> * <chr>          <chr>                                         
    #> 1 df             make_data()                                   
    #> 2 output         "write.csv(df, file_out(\"dir/mydata1.csv\"))"
    

    EDIT: metaprogramming

    I recently added a lengthy section of the manual on metaprogramming. If you want more flexible and automated ways to generate workflow plan data frames, you may have to abandon the drake_plan() function and do more involved tidy evaluation. The discussion on the issue tracker is also relevant.