I have a dataset that has a bunch of columns for different base variables, and each of those base variables has a year in the name that those values are relevant for. I need to create one column for each base variable that is assigned based on the year of the row. I could just use case_when and list each year and name each base variable+year individually, but that's not efficient and there are multiple base variables with over 20 years listed for each. I've tried a) using a function to write the guts of the case_when function, and b)using case_when in a loop, c) using if in a for loop. The first 2 both result in missings and the last in unevaluated string (the name of the correct variable). I know that I must be missing something pretty fundamental (maybe a different something for each strategy). Reprex below.
dataf <- data.frame(alldate = sample(seq(as.Date('2000/01/01'), as.Date('2005/01/01'), by="day"), 20),
var2000 = rnorm(20),
var2001 = rnorm(20),
var2002 = rnorm(20),
var2003 = rnorm(20),
var2004 = rnorm(20),
var2005 = rnorm(20))
#a) using a function to write the guts of the case_when function (returns NAs)
assign_var_year <- function(yearstart = 2000, yearend = 2005, base_var = "var" ){
for(i in seq_along(yearstart:(1+yearend))-1){
year(alldate) == i+yearstart ~ str_c(base_var, i+yearstart)
}
}
dataf2 <- dataf %>%
mutate(var_test = case_when(
assign_var_year(),
T~NA_real_
))
dataf2$var_test %>% head
#b) using case_when in a loop (returns NAs)
for(i in seq_along(min(year(dataf$alldate)):(1+max(year(dataf$alldate))))-1){
dataf$var_test <- case_when(
year(dataf$alldate) == min(year(dataf$alldate))+i~str_c("var", i+yearstart)
)
}
dataf$var_test %>% head
#c) using if in a for loop (returns name of correct variable)(see also below equivalent to ##this using case_when)
dataf$var_test <- NA_real_
for(i in seq_along(min(year(dataf$alldate)):(1+max(year(dataf$alldate))))-1){
dataf$var_test <- ifelse(
year(dataf$alldate) == min(year(dataf$alldate))+i,eval(str_c("var", i+yearstart)),dataf$var_test
)
}
dataf %>% select(alldate, var_test) %>% head
#same as above with case_when
for(i in 0:5){
dataf <- dataf %>%
mutate(var_test = case_when(
year(alldate) == 2000+i ~ str_c("var", 2000+i),
T~var_test,
))
}
dataf$var_test %>% head
I would approach this problem quite differently. Your data is not tidy, and therefore hard to work with. I would first re-organize the data, after which the problem can solved more easily.
library(tidyverse)
I first put the data in long format, making one column for each variable but spreading the years across rows. The code below generalizes if you have other variables than var
, which will be added as additional columns:
df1 <- dataf |>
pivot_longer(
-alldate,
names_to = c(".value", "year"),
names_pattern = "([a-zA-Z]+)(\\d{4})",
names_transform = list(year = as.numeric)
)
# A tibble: 120 × 3 alldate year var <date> <dbl> <dbl> 1 2004-06-11 2000 1.77 2 2004-06-11 2001 0.460 3 2004-06-11 2002 -0.692 4 2004-06-11 2003 -2.15 5 2004-06-11 2004 0.328 6 2004-06-11 2005 1.49 7 2002-09-19 2000 -0.0568 8 2002-09-19 2001 -0.735 9 2002-09-19 2002 -1.65 10 2002-09-19 2003 0.745 # ℹ 110 more rows # ℹ Use `print(n = ...)` to see more rows
We can then simply filter on matching years:
filter(df1, year(alldate) == year)
# A tibble: 20 × 3 alldate year var <date> <dbl> <dbl> 1 2004-06-11 2004 0.328 2 2002-09-19 2002 -1.65 3 2003-06-30 2003 0.624 4 2001-10-09 2001 -0.502 5 2000-07-24 2000 0.123 6 2004-10-05 2004 1.34 7 2000-04-29 2000 0.102 8 2003-08-22 2003 1.32 9 2001-02-19 2001 -2.15 10 2003-12-02 2003 1.17 11 2002-09-17 2002 2.09 12 2000-11-16 2000 -1.11 13 2001-10-13 2001 0.909 14 2002-12-22 2002 -0.275 15 2004-06-04 2004 -0.722 16 2004-06-19 2004 -0.626 17 2004-01-28 2004 -0.300 18 2004-02-21 2004 -0.313 19 2003-04-14 2003 0.709 20 2004-03-17 2004 1.41