rtargets-r-package

How should I use {targets} when I have multiple data files


I have ~50 data files (subjects) that I process individually before I combine them in a data.frame for modelling. I'm unsure how to best use {targets} for this.

I tried using dynamic branching, but I'm unsure how to keep track of subject IDs with this approach. I my current approach I have all data in a named list where first level names are subject IDs, but with targets the names are arbitrary.

I know this is not really a specific questions, but I'm hoping to be pointed towards an appropriate solution instead of getting a "correct" answer for a wrong question.


Solution

  • This is the pattern that I normally use

      tar_files(
        file_paths,
        "file_paths_folder" %>%
          list.files(full.names = TRUE)
      ),
      tar_target(
        processed_files,
        file_paths%>%
          readxl::read_excel() %>% # can be anything read csv, parquet etc.
          janitor::clean_names() %>% # start processing
          mutate_at(vars(a,b,c), as.Date, format = "%Y-%m-%d"), # can be really complex operations
        pattern = map(file_paths)
      )