I have some (medical) longitudinal data on the detection (or not) of bronchiectasis. I wish to count the number of participants with and without bronchiectasis at their latest follow up/check. I then wish to store the number of patients who do have bronchiectasis and the number who don't. This is so I can then insert that number/value into a qmd report easily. (I have other outcomes where I need this method).
Is there a simpler/more accurate method of what I'm doing
Example df
id | date_of_followup | bronchiectasis |
---|---|---|
1 | 01/01/2022 | N |
1 | 01/12/2022 | Y |
2 | 02/07/2022 | Y |
2 | 5/6/2023 | Y |
3 | 01/05/2021 | N |
3 | 3/12/2022 | N |
This is my current code
df1 <- df %>%
group_by(id) %>%
filter(date_of_followup == max(date_of_followup )) %>%
filter(bronchiectasis=="Y")
bronchiectasisyes <- sum(df1$bronchiectasis == "Y")
df2 <- df%>%
group_by(id) %>%
filter(date_of_followup == max(date_of_followup )) %>%
filter(bronchiectasis=="N")
bronchiectasisno <- sum(df2$bronchiectasis == "N")
This gets me the desired outcome, but perhaps is a bit messy and I suspect there is a more elegant/robust method
Fixing the date, filtering for max-per-id, then summarizing:
library(dplyr)
df %>%
mutate(date_of_followup = as.Date(date_of_followup, format="%m/%d/%Y")) %>%
slice_max(date_of_followup, by = id) %>%
count(bronchiectasis)
# bronchiectasis n
# 1 N 1
# 2 Y 2
To confirm this is working, see the interim step:
df %>%
mutate(date_of_followup = as.Date(date_of_followup, format="%m/%d/%Y")) %>%
slice_max(date_of_followup, by = id)
# id date_of_followup bronchiectasis
# 1 1 2022-01-12 Y
# 2 2 2023-05-06 Y
# 3 3 2022-03-12 N
(I'm assuming m/d/Y format for your dates ... fix as appropriate.)