rdplyrmutate

Conditional mutate by group


I have a dataset where I want to select only one row for each individual each year. However, I would like to mutate() a column so that if it says 'yes' for any of that patient's rows, then all the rows say 'yes' for that patient.

This is an example of the dataset I have:

name clinic year date tested
patientx xxy 2022 April yes
patientx xxy 2022 May no
patientxy ggf 2019 Jan no
patientxy ggf 2019 Feb yes
patientxyz ffr 2018 March yes
patientxyz ffr 2019 May no

So where the name, clinic and year are the same, I want the tested column to say 'yes' if any of the other rows for that grouping say 'yes'.

Therefore, this is what I would want the dataset to finally look like:

name clinic year date tested
patientx xxy 2022 April yes
patientx xxy 2022 May yes
patientxy ggf 2019 Jan yes
patientxy ggf 2019 Feb yes
patientxyz ffr 2018 March yes
patientxyz ffr 2019 May no

Solution

  • This is quite straightforward using {dplyr}:

    library(dplyr)
    
    df <- tribble(
      ~ name, ~ clinic, ~ year, ~ date, ~ tested,
      "a",       "xxy",    2022,  "April", "yes",
      "a",       "xxy",    2022,  "May", "no",
      "b",       "ggf",    2019,  "Jan", "no",
      "b",       "ggf",    2019,  "Feb", "yes",
      "c",       "ffr",    2018,  "March", "yes",
      "c",       "ffr",    2019,  "May", "no"
    )
    
    df |> 
      mutate(tested2 = if_else(any(tested == "yes"), "yes", "no"), .by = c(name, year))
    #> # A tibble: 6 × 6
    #>   name  clinic  year date  tested tested2
    #>   <chr> <chr>  <dbl> <chr> <chr>  <chr>  
    #> 1 a     xxy     2022 April yes    yes    
    #> 2 a     xxy     2022 May   no     yes    
    #> 3 b     ggf     2019 Jan   no     yes    
    #> 4 b     ggf     2019 Feb   yes    yes    
    #> 5 c     ffr     2018 March yes    yes    
    #> 6 c     ffr     2019 May   no     no
    

    I would recommend to read this guide before posting future questions. It makes easier to help you.