rfor-loopdplyrvariable-namesdifftime

For loop with dplyr pipeline: problem using dynamic and date variables correctly


I have the below code and example data. I have two issues:

  1. The name of the new variable created using mutate appears as "New_var" in the corresponding data frames rather than the character string(e.g., df1_timediff) that I have assigned to it within the for loop.
    Based on answers for similar questions, I have tried using eval, as.name, and as.character both when defining the New_var variable and within the pipeline but with no luck. When I check the class of New_var, R tells me they are "character".

  2. I would like the New_var variable to be a time difference variable between the current entry and the first entry for that corresponding participants. I have used similar code previously, however, the New_var variable does not appear to be as expected. That is, the time difference returned is not the months between entries. The class of the Submitted_i variables are in Date format, so I'm confused why this might be.

Code

names.dfs <- c("df1", "df2", "df3")

for (i in names.dfs){

  Submitted_i <- as.name(paste0('Submitted_', i))
  New_var <- as.name(paste0(i,'_timediff'))
  
  df_i <-  get(i)
  
  df_i <- df_i %>%
        arrange(eval(Submitted_i)) %>% # Order by date
        group_by(ResultsID) %>% 
        mutate(New_var = (time_length(difftime(eval(Submitted_i), eval(Submitted_i)[1],"months")))) 
               
  assign(paste0(i),df_i)

  }

Example Data


df1 <- structure(list(ResultsID = c(1, 2, 3, 4, 2, 4, 1, 5, 3, 3), RepeatNo = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Submitted_df1 = structure(c(17509, 
17509, 17514, 17484, 17929, 17484, 17502, 17528, 17497, 17488
), class = "Date")), row.names = c(NA, 10L), class = "data.frame")
  
df2 <- structure(list(ResultsID = c(1, 5, 1, 3, 2, 4, 5), RepeatNo = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L), Submitted_df2 = structure(c(16856, 16858, 
16869, 16861, 16875, 16888, 16891), class = "Date")), row.names = c(NA, 
7L), class = "data.frame")
  
df3 <- structure(list(ResultsID = c(1, 2, 3, 1, 2, 4, 4, 5, 3), RepeatNo = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Submitted_df3 = structure(c(17913, 
17930, 17919, 17931, 17921, 17912, 17916, 17931, 17915), class = "Date")), row.names = c(NA, 
-9L), groups = structure(list(.rows = structure(list(1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame")), class = c("rowwise_df", "tbl_df", "tbl", 
"data.frame"))


Solution

  • Your second issue is a problem of your brackets. In your code "months" is the third argument of the difftime function, not the unit-argument of the time_length function. When you add the comment from Martin Gal, it works fine:

    library(lubridate)
    library(dplyr)
    
    names.dfs <- c("df1", "df2", "df3")
    
    for (i in names.dfs){
    
      Submitted_i <- as.name(paste0('Submitted_', i))
      New_var <-  as.name(paste0(i,'_timediff'))
    
      df_i <-  get(i)
    
      df_i <- df_i %>%
        arrange(eval(Submitted_i)) %>% # Order by date
        group_by(ResultsID) %>% 
        mutate({{New_var}} := time_length(
                                   difftime(
                                       eval(Submitted_i),
                                       eval(Submitted_i)[1]
                                   ),
                                   "months"
                               ) 
         )
    
      assign(paste0(i),df_i)
    
    }