rdrake-r-package

How to combine and filter dynamic file targets in R drake?


I create a set of files in my drake plan. I want to copy a subset of these files to another location.

The following code almost achieves that. However, drake's dependency tracking of file changes is lost after taking the subset of file targets that I want to copy.

How can I combine/subset dynamic file targets without losing drake's dependency tracking?

copy_file <- function(file) {
  file_copy <- paste0(file, "_copy")
  file.copy(from = file, to = file_copy, overwrite = TRUE)
  file_copy
}

herb_1_a <- "parsley"
plan <- drake::drake_plan(
  file_1 = target(
    {
      writeLines(herb_1_a, "file_1_a") # Second run
      writeLines("sage", "file_1_b")
      c("file_1_a", "file_1_b")
    },
    format = "file"
  ),

  file_2 = target(
    {
      writeLines("rosemary", "file_2_a")
      writeLines("thyme", "file_2_b")
      c("file_2_a", "file_2_b")
    },
    format = "file"
  ),

  files_to_copy = str_subset(
    c(file_1, file_2),
    "_a$"
  ),

  file_copies = target(
    copy_file(files_to_copy),
    dynamic = map(files_to_copy),
    format = "file"
  )
)

drake::make(plan)
#> ▶ target file_2
#> ▶ target file_1
#> ▶ target files_to_copy
#> ▶ dynamic file_copies
#> > subtarget file_copies_5e57e9ee
#> > subtarget file_copies_ae26ecf9
#> ■ finalize file_copies
readLines("file_1_a")
#> [1] "parsley"
readLines("file_1_a_copy")
#> [1] "parsley"
herb_1_a <- 'banana'
drake::make(plan)
#> ▶ target file_1
#> ▶ target files_to_copy
readLines("file_1_a")
#> [1] "banana"
readLines("file_1_a_copy") # I want this banana
#> [1] "parsley"

Created on 2020-09-24 by the reprex package (v0.3.0)


Solution

  • I think what will solve this is creating a dynamically-mapped set of dynamic input files right before the copying step. In other words, files_to_copy should be a dynamic target of dynamic files. Sketch:

    plan <- drake::drake_plan(
      file_1 = target(
        {
          writeLines(herb_1_a, "file_1_a") # Second run
          writeLines("sage", "file_1_b")
          c("file_1_a", "file_1_b")
        },
        format = "file"
      ),
      
      file_2 = target(
        {
          writeLines("rosemary", "file_2_a")
          writeLines("thyme", "file_2_b")
          c("file_2_a", "file_2_b")
        },
        format = "file"
      ),
      
      files_to_copy_group = str_subset(
        c(file_1, file_2),
        "_a$"
      ),
      
      files_to_copy = target(
        files_to_copy_group,
        dynamic = map(files_to_copy_group),
        format = "file"
      ),
      
      file_copies = target(
        copy_file(files_to_copy),
        dynamic = map(files_to_copy),
        format = "file"
      )
    )