rdplyrlongitudinal

Filtering longitudinal data by last recorded event, and counting number of rows that meet criteria (and storing that value)


I have some (medical) longitudinal data on the detection (or not) of bronchiectasis. I wish to count the number of participants with and without bronchiectasis at their latest follow up/check. I then wish to store the number of patients who do have bronchiectasis and the number who don't. This is so I can then insert that number/value into a qmd report easily. (I have other outcomes where I need this method).

Is there a simpler/more accurate method of what I'm doing

Example df

id date_of_followup bronchiectasis
1 01/01/2022 N
1 01/12/2022 Y
2 02/07/2022 Y
2 5/6/2023 Y
3 01/05/2021 N
3 3/12/2022 N

This is my current code

df1 <- df %>%
    group_by(id) %>%
    filter(date_of_followup == max(date_of_followup )) %>%
    filter(bronchiectasis=="Y")

bronchiectasisyes <- sum(df1$bronchiectasis == "Y")

df2 <- df%>%
    group_by(id) %>%
    filter(date_of_followup == max(date_of_followup )) %>%
    filter(bronchiectasis=="N")

bronchiectasisno <- sum(df2$bronchiectasis == "N")

This gets me the desired outcome, but perhaps is a bit messy and I suspect there is a more elegant/robust method


Solution

  • Fixing the date, filtering for max-per-id, then summarizing:

    library(dplyr)
    df %>%
      mutate(date_of_followup = as.Date(date_of_followup, format="%m/%d/%Y")) %>%
      slice_max(date_of_followup, by = id) %>%
      count(bronchiectasis)
    #   bronchiectasis n
    # 1              N 1
    # 2              Y 2
    

    To confirm this is working, see the interim step:

    df %>%
      mutate(date_of_followup = as.Date(date_of_followup, format="%m/%d/%Y")) %>%
      slice_max(date_of_followup, by = id)
    #   id date_of_followup bronchiectasis
    # 1  1       2022-01-12              Y
    # 2  2       2023-05-06              Y
    # 3  3       2022-03-12              N
    

    (I'm assuming m/d/Y format for your dates ... fix as appropriate.)