I have a dataframe with columns for State, state abbreviation, age, year, and years 2012-2015 (indicated by columns Y2012, Y2013, Y2014, Y2015. There is a law that came into effect in 2013 in Alabama which made doing something illegal for those ages 19 and under (we are only concerned about age ranges 18-20 years in this toy example). To indicate this in the current example dataframe, for AL in Y2013, the value is 19, ie., the upper bound of threshold. Example dataset:
> ds_ex
# A tibble: 12 × 8
State state_abbr age year Y2012 Y2013 Y2014 Y2015
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Alabama AL 18 2012 NA 19 NA NA
2 Alabama AL 18 2013 NA 19 NA NA
3 Alabama AL 18 2014 NA 19 NA NA
4 Alabama AL 18 2015 NA 19 NA NA
5 Alabama AL 19 2012 NA 19 NA NA
6 Alabama AL 19 2013 NA 19 NA NA
7 Alabama AL 19 2014 NA 19 NA NA
8 Alabama AL 19 2015 NA 19 NA NA
9 Alabama AL 20 2012 NA 19 NA NA
10 Alabama AL 20 2013 NA 19 NA NA
11 Alabama AL 20 2014 NA 19 NA NA
12 Alabama AL 20 2015 NA 19 NA NA
ds_ex =
structure(list(State = c("Alabama", "Alabama", "Alabama", "Alabama",
"Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Alabama",
"Alabama", "Alabama"), state_abbr = c("AL", "AL", "AL", "AL",
"AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"), age = c("18",
"18", "18", "18", "19", "19", "19", "19", "20", "20", "20", "20"
), year = c("2012", "2013", "2014", "2015", "2012", "2013", "2014",
"2015", "2012", "2013", "2014", "2015"), Y2012 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), Y2013 = c("19", "19", "19", "19", "19", "19",
"19", "19", "19", "19", "19", "19"), Y2014 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), Y2015 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
What I want is an indicator value that is 1 when age is less than or equal to any value that is contained in variables Y2013:Y2015 for that state-year-age row AND the year that is contained in that column (year 2013 as indicated from the value in Y2013 in this example), is less than or equal to the year variable. So for my example dataframe, this is what I want:
ds_ex_ans
# A tibble: 12 × 9
State state_abbr age year Y2012 Y2013 Y2014 Y2015 ind_law
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 Alabama AL 18 2012 NA 19 NA NA 0
2 Alabama AL 18 2013 NA 19 NA NA 1
3 Alabama AL 18 2014 NA 19 NA NA 1
4 Alabama AL 18 2015 NA 19 NA NA 1
5 Alabama AL 19 2012 NA 19 NA NA 0
6 Alabama AL 19 2013 NA 19 NA NA 1
7 Alabama AL 19 2014 NA 19 NA NA 1
8 Alabama AL 19 2015 NA 19 NA NA 1
9 Alabama AL 20 2012 NA 19 NA NA 0
10 Alabama AL 20 2013 NA 19 NA NA 0
11 Alabama AL 20 2014 NA 19 NA NA 0
12 Alabama AL 20 2015 NA 19 NA NA 0
But of course I have many more states, years, and ages. I'd be grateful for any help!
library(dplyr)
library(tidyr)
ds_ex %>%
pivot_longer(Y2012:Y2015,
names_prefix="Y",
names_transform=list(name=as.integer)) %>%
mutate(age=as.numeric(age),
year=as.integer(year),
value=as.numeric(value),
ind_law = if_else(year >= name & age<=value,1,0)) %>%
summarise(ind_law=max(ind_law, na.rm=TRUE), .by=c(State,state_abbr,age,year))
# A tibble: 12 × 5
State state_abbr age year ind_law
<chr> <chr> <dbl> <int> <dbl>
1 Alabama AL 18 2012 0
2 Alabama AL 18 2013 1
3 Alabama AL 18 2014 1
4 Alabama AL 18 2015 1
5 Alabama AL 19 2012 0
6 Alabama AL 19 2013 1
7 Alabama AL 19 2014 1
8 Alabama AL 19 2015 1
9 Alabama AL 20 2012 0
10 Alabama AL 20 2013 0
11 Alabama AL 20 2014 0
12 Alabama AL 20 2015 0