rrulesrowwise

Efficient way to implement rule-based value assignment


I'm trying to come up with an elegant, rule-based way to assign codes to rows in a data frame based on combinations of values in columns, using this data:

library(tidyr)
df <- crossing(yr2018=c("M","S","W"),
               yr2019=c("M","S","W"),
                yr2020=c("M","S","W")) %>%
  print(n=27)

# A tibble: 27 × 3
   yr2018 yr2019 yr2020
   <chr>  <chr>  <chr> 
 1 M      M      M     
 2 M      M      S     
 3 M      M      W     
 4 M      S      M     
 5 M      S      S     
 6 M      S      W     
 7 M      W      M     
 8 M      W      S     
 9 M      W      W     
10 S      M      M     
11 S      M      S     
12 S      M      W     
13 S      S      M     
14 S      S      S     
15 S      S      W     
16 S      W      M     
17 S      W      S     
18 S      W      W     
19 W      M      M     
20 W      M      S     
21 W      M      W     
22 W      S      M     
23 W      S      S     
24 W      S      W     
25 W      W      M     
26 W      W      S     
27 W      W      W     
>

What I want to end up with is a column with codes applied with rules such the following:

This feels like a big, ugly if statement, but I'm hoping for something more elegant, especially since my real data is actually 4x5 (625 rows). It also feels like maybe regular expressions, which I struggle with.

I started looking into row-wise functions and found rowwise() as a start to logically reconfigure the data frame, but it looks like the number of functions that can operate that way are limited.

All guidance welcome!


Solution

  • You can use mutate and case_when to efficiently satisfy these conditions. sort in the second logic will organize the letters as you described.

    Since case_when evaluates iteratively, you may be able to parse this down to make it more elegant, but as written it should follow your exact conditions:

    library(dplyr)
    
    df %>%
      rowwise() %>%
      mutate(new_column = case_when(
        yr2018 == yr2019 & yr2019 == yr2020 ~ paste0("CON", yr2018),
        yr2018 == yr2020 ~ paste(sort(c(yr2019, yr2020)), collapse = ""),
        yr2018 != yr2019 & yr2019 != yr2020 & yr2018 != yr2020 ~ "MSW",
        yr2019 == yr2020 & yr2018 != yr2020 ~ paste0("CON", yr2020),
        yr2018 == yr2019 & yr2018 != yr2020 ~ paste0("CON", yr2018)
      )) 
    

    Output:

       yr2018 yr2019 yr2020 new_column
       <chr>  <chr>  <chr>  <chr>     
     1 M      M      M      CONM      
     2 M      M      S      CONM      
     3 M      M      W      CONM      
     4 M      S      M      MS        
     5 M      S      S      CONS      
     6 M      S      W      MSW       
     7 M      W      M      MW        
     8 M      W      S      MSW       
     9 M      W      W      CONW      
    10 S      M      M      CONM      
    11 S      M      S      MS        
    12 S      M      W      MSW       
    13 S      S      M      CONS      
    14 S      S      S      CONS      
    15 S      S      W      CONS      
    16 S      W      M      MSW       
    17 S      W      S      SW        
    18 S      W      W      CONW      
    19 W      M      M      CONM      
    20 W      M      S      MSW       
    21 W      M      W      MW        
    22 W      S      M      MSW       
    23 W      S      S      CONS      
    24 W      S      W      SW        
    25 W      W      M      CONW      
    26 W      W      S      CONW      
    27 W      W      W      CONW