rrepeatlongitudinal

how to filter repeat data based on repeated id and time interval


I have longitudinal patient data in R. I would like to subset patients in the patid column based on this condition: three or more occurrences within one year period (one year= any 12 months period) code to get the same table:

structure(list(patid = c("1", "1", "1", "1", "2", "2", "3", "3", 
"3", "4", "4", "4", "4"), observation_date = c("07/07/2016", 
"07/08/2016", "07/11/2016", "07/07/2019", "07/05/2015", "02/12/2016", 
"07/05/2015", "07/06/2015", "16/06/2015", "07/05/2015", "02/12/2016", 
"18/12/2016", "15/01/2017")), class = "data.frame", row.names = c(NA, 
-13L))

Table1:

patid observation_date
1 07/07/2016
1 07/08/2016
1 07/11/2016
1 07/07/2019
2 07/05/2015
2 02/12/2016
3 07/05/2015
3 07/06/2015
3 16/06/2015
4 07/05/2015
4 02/12/2016
4 18/12/2016
4 15/01/2017

The expected table would look like this (a list of patids that meet the criteria; 3 or more observations in 12 months interval)

patid
1
3
4

Solution

  • The answer above was helpful. I figured I could also do this:

    #First I only kept patid's with observation dates that are less than 365 days apart
    multi_pts <- df %>% 
      arrange(observation_date) %>% 
      group_by(patid) %>% 
      mutate(gap = observation_date - lag(observation_date)) %>%
      filter(gap <365)
    #then I  filtered patients that have at least three codes  
    count_multi_pts <- 
      multi_pts %>%  
      count(patid) %>% 
      filter(n >= 3)