Extract rows for the first week of data that meets criteria in R

I have longitudinal lab data where each person in a dataset has a line for every day they measured their A1C and uploaded it via an app. I need help identifying and extracting the first 'valid' week of data where the person uploaded their measurement on least 5 of 7 days in a given week. However, if they had data for 6 days or 7, I want to extract all possible measures for that week.

An example of the data would be this:

  person_id       date a1c
    1000492 2019-09-11 5.5
    1000492 2019-09-22 5.7
    1000492 2019-09-23 5.5
    1000492 2019-09-24 5.6
    1000492 2019-10-15 5.5
    1000492 2019-10-16 5.6
    1000492 2019-10-18 5.5
    1000492 2019-10-19 5.7
    1000492 2019-10-20 5.5
    1000492 2019-11-14 5.6
    1000492 2019-11-15 5.5 
    1000492 2019-12-16 5.6
    1000499 2019-09-11 5.5
    1000499 2019-09-12 5.7
    1000499 2019-09-26 5.5
    1000499 2019-09-27 5.6
    1000499 2019-09-28 5.5
    1000499 2019-09-29 5.7
    1000499 2019-09-30 5.5
    1000499 2019-10-01 5.6
    1000499 2019-10-02 5.5

So ideally, I would get a resulting dataframe of something like this:

  person_id       date a1c
    1000492 2019-10-15 5.5
    1000492 2019-10-16 5.6
    1000492 2019-10-18 5.5
    1000492 2019-10-19 5.7
    1000492 2019-10-20 5.5
    1000499 2019-09-26 5.5
    1000499 2019-09-27 5.6
    1000499 2019-09-28 5.5
    1000499 2019-09-29 5.7
    1000499 2019-09-30 5.5
    1000499 2019-10-01 5.6
    1000499 2019-10-02 5.5

Solution

To get your required result of measurements which are within in the first valid 5-7 day time windows, do this:

library(dplyr)
library(lubridate)

df <- data.frame(
  person_id = c(rep(1000492, 12), rep(1000499, 9)),
  date = as.Date(c( "2019-09-11", "2019-09-22", "2019-09-23", "2019-09-24","2019-10-15", "2019-10-16", "2019-10-18", "2019-10-19", "2019-10-20","2019-11-14", "2019-11-15", "2019-12-16","2019-09-11", "2019-09-12", "2019-09-26", "2019-09-27", "2019-09-28","2019-09-29", "2019-09-30", "2019-10-01", "2019-10-02")),
  a1c = c(5.5, 5.7, 5.5, 5.6, 5.5, 5.6, 5.5, 5.7, 5.5, 5.6, 5.5, 5.6, 5.5, 5.7, 5.5, 5.6, 5.5, 5.7, 5.5, 5.6, 5.5
  )
)    

res <- df %>%
  arrange(person_id, date) %>%
  group_by(person_id) %>%
  mutate(
    window_start = date,
    window_end = date + days(6),
    measurements_in_window = sapply(date, function(ws) sum(date >= ws & date <= ws + days(6)))
  ) %>%
  filter(measurements_in_window >= 5) %>%
  filter(date == min(date)) %>%
  inner_join(df, by = "person_id") %>%
  filter(date.y >= window_start & date.y <= window_end) %>% mutate(date = date.y, a1c = a1c.y) %>% select(person_id, date, a1c)

Output:

> res
# A tibble: 12 × 3
   person_id date         a1c
       <dbl> <date>     <dbl>
 1   1000492 2019-10-15   5.5
 2   1000492 2019-10-16   5.6
 3   1000492 2019-10-18   5.5
 4   1000492 2019-10-19   5.7
 5   1000492 2019-10-20   5.5
 6   1000499 2019-09-26   5.5
 7   1000499 2019-09-27   5.6
 8   1000499 2019-09-28   5.5
 9   1000499 2019-09-29   5.7
10   1000499 2019-09-30   5.5
11   1000499 2019-10-01   5.6
12   1000499 2019-10-02   5.5