rdataframedplyracross

How to match 1 variable in dataframe with potential values in many other variables to compute indicator using tidyverse R?


I have a dataframe with columns for State, state abbreviation, age, year, and years 2012-2015 (indicated by columns Y2012, Y2013, Y2014, Y2015. There is a law that came into effect in 2013 in Alabama which made doing something illegal for those ages 19 and under (we are only concerned about age ranges 18-20 years in this toy example). To indicate this in the current example dataframe, for AL in Y2013, the value is 19, ie., the upper bound of threshold. Example dataset:

> ds_ex
# A tibble: 12 × 8
   State   state_abbr age   year  Y2012 Y2013 Y2014 Y2015
   <chr>   <chr>      <chr> <chr> <chr> <chr> <chr> <chr>
 1 Alabama AL         18    2012  NA    19    NA    NA   
 2 Alabama AL         18    2013  NA    19    NA    NA   
 3 Alabama AL         18    2014  NA    19    NA    NA   
 4 Alabama AL         18    2015  NA    19    NA    NA   
 5 Alabama AL         19    2012  NA    19    NA    NA   
 6 Alabama AL         19    2013  NA    19    NA    NA   
 7 Alabama AL         19    2014  NA    19    NA    NA   
 8 Alabama AL         19    2015  NA    19    NA    NA   
 9 Alabama AL         20    2012  NA    19    NA    NA   
10 Alabama AL         20    2013  NA    19    NA    NA   
11 Alabama AL         20    2014  NA    19    NA    NA   
12 Alabama AL         20    2015  NA    19    NA    NA

ds_ex = 
structure(list(State = c("Alabama", "Alabama", "Alabama", "Alabama", 
"Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", 
"Alabama", "Alabama"), state_abbr = c("AL", "AL", "AL", "AL", 
"AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"), age = c("18", 
"18", "18", "18", "19", "19", "19", "19", "20", "20", "20", "20"
), year = c("2012", "2013", "2014", "2015", "2012", "2013", "2014", 
"2015", "2012", "2013", "2014", "2015"), Y2012 = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_), Y2013 = c("19", "19", "19", "19", "19", "19", 
"19", "19", "19", "19", "19", "19"), Y2014 = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_), Y2015 = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_)), row.names = c(NA, 
-12L), class = c("tbl_df", "tbl", "data.frame"))

What I want is an indicator value that is 1 when age is less than or equal to any value that is contained in variables Y2013:Y2015 for that state-year-age row AND the year that is contained in that column (year 2013 as indicated from the value in Y2013 in this example), is less than or equal to the year variable. So for my example dataframe, this is what I want:

 ds_ex_ans
# A tibble: 12 × 9
   State   state_abbr age   year  Y2012 Y2013 Y2014 Y2015 ind_law
   <chr>   <chr>      <chr> <chr> <chr> <chr> <chr> <chr>   <dbl>
 1 Alabama AL         18    2012  NA    19    NA    NA          0
 2 Alabama AL         18    2013  NA    19    NA    NA          1
 3 Alabama AL         18    2014  NA    19    NA    NA          1
 4 Alabama AL         18    2015  NA    19    NA    NA          1
 5 Alabama AL         19    2012  NA    19    NA    NA          0
 6 Alabama AL         19    2013  NA    19    NA    NA          1
 7 Alabama AL         19    2014  NA    19    NA    NA          1
 8 Alabama AL         19    2015  NA    19    NA    NA          1
 9 Alabama AL         20    2012  NA    19    NA    NA          0
10 Alabama AL         20    2013  NA    19    NA    NA          0
11 Alabama AL         20    2014  NA    19    NA    NA          0
12 Alabama AL         20    2015  NA    19    NA    NA          0

But of course I have many more states, years, and ages. I'd be grateful for any help!


Solution

  • library(dplyr)
    library(tidyr)
    
    ds_ex %>%
      pivot_longer(Y2012:Y2015, 
                   names_prefix="Y", 
                   names_transform=list(name=as.integer)) %>%
      mutate(age=as.numeric(age), 
             year=as.integer(year), 
             value=as.numeric(value),
             ind_law = if_else(year >= name & age<=value,1,0)) %>%
      summarise(ind_law=max(ind_law, na.rm=TRUE), .by=c(State,state_abbr,age,year))
    
    # A tibble: 12 × 5
       State   state_abbr   age  year ind_law
       <chr>   <chr>      <dbl> <int>   <dbl>
     1 Alabama AL            18  2012       0
     2 Alabama AL            18  2013       1
     3 Alabama AL            18  2014       1
     4 Alabama AL            18  2015       1
     5 Alabama AL            19  2012       0
     6 Alabama AL            19  2013       1
     7 Alabama AL            19  2014       1
     8 Alabama AL            19  2015       1
     9 Alabama AL            20  2012       0
    10 Alabama AL            20  2013       0
    11 Alabama AL            20  2014       0
    12 Alabama AL            20  2015       0