tidyversemicrosoft-r

Microsoft R- Tidyverse: If a data file fails to load, create an empty tibble/table in it's place


I am really not sure how to phrase this concisely.. My question is: Is it possible to add an error handling feature so that if a data file (such as a csv) fails to load as a table/tibble, create a blank version of it?

Here is what I mean:

My normal csv load looks like this:

Monday2 <- paste0(my_file_location/my_file_name",Monday,".csv")
leads1 <- tibble(read.csv(Monday2))

Tuesday2 <- paste0("my_file_location/My_file_name",Tuesday,".csv")
leads2 <- tibble(read.csv(Tuesday2))

Wednesday2 <- paste0("my_file_location/my_file_name",Wednesday,".csv")
leads3 <- tibble(read.csv(Wednesday2))

If for some reason my csv failed to load (the file doesn't exist, or I entered the name incorrectly for example) can a blank version of it be created?

My idea for the blank tibble would look like this:

Leads21 <- tibble("Column1"= "", "Column2"= "", "Column3"= "")
Leads22 <- tibble("Column1"= "", "Column2"= "", "Column3"= "")
Leads23 <- tibble("Column1"= "", "Column2"= "", "Column3"= "")

This blank tibble would be the exact same columns as a properly loaded file. I have 5 files I bind each Friday in an automated process.. and if a file fails to load I can catch it downstream in my process (one of the columns is the file name/date) but I don't want the whole process to fail.

a typical 'failed to load' error looks like this:

In file(file, "rt") : cannot open file 'my_file_location/My_file_name_2022-03-27.csv': No such 
file or directory

The bind of all 5 files then fails with an error message like:

### Join full weeks worth of leads into 1 file
Leads <- bind_rows(leads1,leads2,leads3, leads4, leads5) 
Error in list2(...) : object 'leads1' not found

This then causes the rest of my code to fail/act incorrectly. If I can bind an empty tibble, my code could finish running and I can check for missing files at the end. Ultimately if a file is missing it is not as important as processing the existing files (so stopping my code to locate/fix the failed load is not important)

My background is in microsoft access VBA and I keep trying to write something like:

If tibble Leads1 exists, use it.. If tibble Leads1 does not exist use Leads21

not sure how to do this in R. I have been trying to read/understand the try() wrapper, but I don't understand how to use it in my case.


Solution

  • Here is how I ended up working this issue out. Not sure it is the most elegant, but it does the job.

    I create the blank tibble first. I create a blank tibble for each file that I am loading (scalability is terrible with this method); then I read each .csv file. If a .csv file fails to load then my blank tibble is not replaced and will be available for the bind I do at the end.

    leads1 <- tibble("Column_1" = "", "Column_2" = "")
    leads1$filename <- Monday
    
    
    
    Monday2 <- paste0("my_file_location/File_name_part1",Monday,".csv")
    leads1 <- tibble(read.csv(Monday2, colClasses = 'character'))
    leads1$filename <- Monday
    
    Leads <- bind_rows(leads1,leads2,leads3, leads4, leads5) %>% distinct(Column1, .keep_all = TRUE)