I have the below code and example data. I have two issues:
The name of the new variable created using mutate appears as "New_var" in the corresponding data frames rather than the character string(e.g., df1_timediff) that I have assigned to it within the for loop.
Based on answers for similar questions, I have tried using eval, as.name, and as.character both when defining the New_var variable and within the pipeline but with no luck. When I check the class of New_var, R tells me they are "character".
I would like the New_var variable to be a time difference variable between the current entry and the first entry for that corresponding participants. I have used similar code previously, however, the New_var variable does not appear to be as expected. That is, the time difference returned is not the months between entries. The class of the Submitted_i variables are in Date format, so I'm confused why this might be.
Code
names.dfs <- c("df1", "df2", "df3")
for (i in names.dfs){
Submitted_i <- as.name(paste0('Submitted_', i))
New_var <- as.name(paste0(i,'_timediff'))
df_i <- get(i)
df_i <- df_i %>%
arrange(eval(Submitted_i)) %>% # Order by date
group_by(ResultsID) %>%
mutate(New_var = (time_length(difftime(eval(Submitted_i), eval(Submitted_i)[1],"months"))))
assign(paste0(i),df_i)
}
Example Data
df1 <- structure(list(ResultsID = c(1, 2, 3, 4, 2, 4, 1, 5, 3, 3), RepeatNo = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Submitted_df1 = structure(c(17509,
17509, 17514, 17484, 17929, 17484, 17502, 17528, 17497, 17488
), class = "Date")), row.names = c(NA, 10L), class = "data.frame")
df2 <- structure(list(ResultsID = c(1, 5, 1, 3, 2, 4, 5), RepeatNo = c(0L,
0L, 0L, 0L, 0L, 0L, 0L), Submitted_df2 = structure(c(16856, 16858,
16869, 16861, 16875, 16888, 16891), class = "Date")), row.names = c(NA,
7L), class = "data.frame")
df3 <- structure(list(ResultsID = c(1, 2, 3, 1, 2, 4, 4, 5, 3), RepeatNo = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Submitted_df3 = structure(c(17913,
17930, 17919, 17931, 17921, 17912, 17916, 17931, 17915), class = "Date")), row.names = c(NA,
-9L), groups = structure(list(.rows = structure(list(1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -9L), class = c("tbl_df",
"tbl", "data.frame")), class = c("rowwise_df", "tbl_df", "tbl",
"data.frame"))
Your second issue is a problem of your brackets. In your code "months" is the third argument of the difftime function, not the unit-argument of the time_length function. When you add the comment from Martin Gal, it works fine:
library(lubridate)
library(dplyr)
names.dfs <- c("df1", "df2", "df3")
for (i in names.dfs){
Submitted_i <- as.name(paste0('Submitted_', i))
New_var <- as.name(paste0(i,'_timediff'))
df_i <- get(i)
df_i <- df_i %>%
arrange(eval(Submitted_i)) %>% # Order by date
group_by(ResultsID) %>%
mutate({{New_var}} := time_length(
difftime(
eval(Submitted_i),
eval(Submitted_i)[1]
),
"months"
)
)
assign(paste0(i),df_i)
}