Reading the documentation of the drake package, I found no other way to define the order of the targets without the use of 'file_in' and 'file_out'.
file_in() marks individual files (and whole directories) that your targets depend on.
file_out() marks individual files (and whole directories) that your targets create.
It is not possible, however, to use both with dynamic targets.
So how can I define an order that should be followed between dynamic targets?
I also tried to use make(plan, targets = c("ftp_list", "download.dbc", "dbc_list", "generate_parquet"))
, but it didn't work
In the code below, for example, I have four targets. What I'd like (order):
Any idea how I can link dynamic targets without using file_in and file_out (not allowed in this case)? Thanks!
Code just as example:
URL <- "ftp://ftp.url"
LOCAL_PATH <- paste0(getwd())
plan <- drake_plan(
ftp_list = obtain_filenames_from_url(url_ = URL,
remove_extension_from_filename_ = FALSE,
full_names = TRUE)[0:10],
download.dbc = target(download_dbc(ftp_list,
local_path = paste0(LOCAL_PATH, "/")),
dynamic = map(ftp_list)),
dbc_list = target(list.files(LOCAL_PATH, full.names = TRUE,
pattern = "*.dbc")),
generate_parquet = target(convert_dbc(dbc_list, delete_dbc_after_conversion = TRUE),
dynamic = map(dbc_list))
)
plan graph output:
file_in()
and file_out()
are only necessary when you actually need to work with files, directories, or URLs. drake
targets are R objects, and target order is determined by how targets are mentioned in commands. drake
reads your commands and functions with static code analysis to resolve target order. In the plan below, targets a
, b
, and c
are in an arbitrary order, but drake
runs them in the correct order because of how the symbols are mentioned.
library(drake)
plan <- drake_plan(
c = head(b),
a = mtcars[, seq_len(3)],
b = tail(a)
)
plot(plan)
make(plan)
#> target a
#> target b
#> target c
readd(c) # Targets are R objects
#> mpg cyl disp
#> Porsche 914-2 26.0 4 120.3
#> Lotus Europa 30.4 4 95.1
#> Ford Pantera L 15.8 8 351.0
#> Ferrari Dino 19.7 6 145.0
#> Maserati Bora 15.0 8 301.0
#> Volvo 142E 21.4 4 121.0
Created on 2020-02-07 by the reprex package (v0.3.0)
Here are some things that could help your current plan.
file_in()
on ftp://ftp.url
to detect when ftp_list
should update.get_dbc()
) to download some files (part of the ftp_list
) and read them into memory.drake
will automatically store those data frames in fst
files.Related:
Sketch:
get_dbc_data_frame <- function(ftp_list_entry) {
# 1. Download the files from the ftp_list_entry.
# 2. Read them into memory.
# 3. Return a data frame.
}
plan <- drake_plan(
ftp_list = obtain_filenames_from_url(
url_ = file_in("ftp://ftp.url"),
remove_extension_from_filename_ = FALSE,
full_names = TRUE
)[seq(0, 10)],
dbc_data = target(
get_dbc_data_frame(ftp_list, local_path = paste0(getwd(), "/")),
format = "fst", # Tell drake to store the data frame as an fst file.
dynamic = map(ftp_list)
)
)