I am using drake
to create multiple output files, where I want to specify the path by a variable. Something like
outpath <- "data"
outfile <- file.path(outpath, "mydata.csv")
write.csv(df, outfile)
But file_out
doesn't seem to work with arguments given to it other than literal characters.
To give a small code example:
library(drake)
outpath <- "data"
# for reproducibility only
if (!dir.exists(outpath)) dir.create(outpath)
make_data <- function() data.frame(x = 1:10, y = rnorm(10))
directly specifying the file:
p0 <- drake_plan(
df = make_data(),
write.csv(df, file_out("data/mydata0.csv"))
)
make(p0)
#> target file "data/mydata0.csv"
using file.path
to construct the outfile
p1 <- drake_plan(
df = make_data(),
write.csv(df, file_out(file.path(outpath, "mydata1.csv")))
)
make(p1)
#> target file "mydata1.csv"
#> Error: The file does not exist: mydata1.csv
#> In addition: Warning message:
#> File "mydata1.csv" was built or processed,
#> but the file itself does not exist.
I guess drake finds only the literal string as a target and not the result of file.path(...)
, for example, this fails as well
p2 <- drake_plan(
df = make_data(),
outfile = file.path(outpath, "mydata1.csv"),
write.csv(df, file_out(outfile))
)
#> Error: found an empty file_out() in command: write.csv(df, file_out(outfile))
Any idea how to fix that?
Sorry I am so late to this thread. I can more easily find questions with the drake-r-package
tag.
Thanks to @Alexis for providing the link to the relevant thread. Wildcards can really help here.
All your targets, input files, and output files need to be explicitly named in advance. This is so drake
can figure out all the dependency relationships without evaluating any code in your plan. Since drake
is responsible for figuring out which targets to build when, I am probably not going to relax this requirement in future development.
For what it's worth, tidy evaluation may also help.
library(drake) # version 5.3.0
pkgconfig::set_config("drake::strings_in_dots" = "literals")
file <- file.path("dir", "mydata1.csv")
drake_plan(
df = make_data(),
output = write.csv(df, file_out(!!file))
)
#> # A tibble: 2 x 2
#> target command
#> * <chr> <chr>
#> 1 df make_data()
#> 2 output "write.csv(df, file_out(\"dir/mydata1.csv\"))"
I recently added a lengthy section of the manual on metaprogramming. If you want more flexible and automated ways to generate workflow plan data frames, you may have to abandon the drake_plan()
function and do more involved tidy evaluation. The discussion on the issue tracker is also relevant.