I have a dataset where I want to select only one row for each individual each year. However, I would like to mutate()
a column so that if it says 'yes' for any of that patient's rows, then all the rows say 'yes' for that patient.
This is an example of the dataset I have:
name | clinic | year | date | tested |
---|---|---|---|---|
patientx | xxy | 2022 | April | yes |
patientx | xxy | 2022 | May | no |
patientxy | ggf | 2019 | Jan | no |
patientxy | ggf | 2019 | Feb | yes |
patientxyz | ffr | 2018 | March | yes |
patientxyz | ffr | 2019 | May | no |
So where the name, clinic and year are the same, I want the tested column to say 'yes' if any of the other rows for that grouping say 'yes'.
Therefore, this is what I would want the dataset to finally look like:
name | clinic | year | date | tested |
---|---|---|---|---|
patientx | xxy | 2022 | April | yes |
patientx | xxy | 2022 | May | yes |
patientxy | ggf | 2019 | Jan | yes |
patientxy | ggf | 2019 | Feb | yes |
patientxyz | ffr | 2018 | March | yes |
patientxyz | ffr | 2019 | May | no |
This is quite straightforward using {dplyr}
:
library(dplyr)
df <- tribble(
~ name, ~ clinic, ~ year, ~ date, ~ tested,
"a", "xxy", 2022, "April", "yes",
"a", "xxy", 2022, "May", "no",
"b", "ggf", 2019, "Jan", "no",
"b", "ggf", 2019, "Feb", "yes",
"c", "ffr", 2018, "March", "yes",
"c", "ffr", 2019, "May", "no"
)
df |>
mutate(tested2 = if_else(any(tested == "yes"), "yes", "no"), .by = c(name, year))
#> # A tibble: 6 × 6
#> name clinic year date tested tested2
#> <chr> <chr> <dbl> <chr> <chr> <chr>
#> 1 a xxy 2022 April yes yes
#> 2 a xxy 2022 May no yes
#> 3 b ggf 2019 Jan no yes
#> 4 b ggf 2019 Feb yes yes
#> 5 c ffr 2018 March yes yes
#> 6 c ffr 2019 May no no
I would recommend to read this guide before posting future questions. It makes easier to help you.