I would like to use R objects (e.g., cleaned data) generated in one git-versioned R project in another git-versioned R project.
Specifically, I have multiple git-versioned R projects (that hold drake
plans) that do various things for my thesis experiments (e.g., generate materials, import and clean data, generate reports/articles).
The experiment-specific projects should ideally be:
At the moment I can see three ways of doing this:
here
it seems that this would introduce poor reproducibility.Is there any other way of doing this?
Edit: I tried @landau's suggestion of using a single drake
plan, which worked well for a while, until (similar to @vrognas' case) I ended up with too many sub-projects (e.g., conference presentations and manuscripts) that relied on the same objects. Therefore, I added some clarifications above to my intentions with the question.
My first recommendation is to use a single drake
plan to unite the stages of the overall project that need to share data. drake
is designed to handle a lot of moving parts this way, and it will be more seamless when it comes to drake
's decisions about what to rerun downstream. But if you really do need different plans in different sub-projects that share data, you can track each shared dataset as a file_out()
file in one plan and track it with file_in()
in another plan.
upstream_plan <- drake_plan(
export_file = write_csv(dataset, file_out("exported_data/dataset.csv"))
)
downstream_plan <- drake_plan(
dataset = read_csv(file_in("../upstream_project/exported_data/dataset.csv"))
)