rrunner

R rolling window function - best maximum of any day stretch


I have a coding question on a large dataset. I'm trying to use the 'runner' function because I think this solution required a rolling window function and also it's a large dataset so I'm doubtful that a for loop would work.

Here is a recreation of the relevant parts of that dataset.

id <- c("1111", "1111", "1111", "1111", "2222", "2222", "2222", "2222", "3333", "3333")

date <- c("2022-01-01", "2022-02-01", "2022-03-01", "2022-04-01", "2022-01-01", "2022-02-01", "2022-03-01", "2022-04-01", "2022-01-01", "2022-02-01")

drug <- c("Drug A", "Drug B", "Drug A", "Drug A", "Drug B", "Drug A", "Drug A", "Drug B", "Drug A", "Drug B")

day_supply <- c(1, 2, 3, 5, 8, 13, 21, 34, 21, 13)

drug_table <- tibble(id, date, drug, day_supply)

drug_table$date <- as.Date(drug_table$date)

For the research question, continuous treatment in a drug is defined as having a 180-day window where the prescription refill gap is <= 7 in any 180-day stretch of time.

I'm working out some of the logic here in this code:

drug_table %>%
  mutate(drug_will_be_used_until = date + day_supply) %>%
  mutate(refill_gap_at_time_of_new_prescription_since_last_prescription = date - lag(drug_will_be_used_until))

As a first step, I'm calculating the number of days that have passed without a prescription, at the time a new prescription is generated. But what about the rolling window function part? I found this library called 'runner' that I think is probably useful for this problem. Here is what my code looks like when trying to use runner.

library(runner)

drug_table %>%
  group_by(id) %>% 
  mutate(drug_will_be_used_until = date + day_supply) %>%
  mutate(refill_gap_at_time_of_new_prescription_since_last_prescription = date - lag(drug_will_be_used_until)) %>%
  arrange(id, date) %>%
  filter(refill_gap_at_time_of_new_prescription_since_last_prescription != "NA days") %>%
  group_by(id) %>% 
  mutate(max_refill_gap_in_best_180_day_stretch = runner::runner(refill_gap_at_time_of_new_prescription_since_last_prescription, min, k = 180, idx = date)) 

I think this answer is kind of close, but I don't think it quite answers my question. With this code, I'm finding the maximum refill gap in any 180-day stretch... but what I really want to figure out is what is the maximum refill gap in any 180-day stretch, where that maximum refill gap is the smallest maximum of any 180-day stretch... The exact research question, that I think that logic would answer, is: Is there any 180-day period where the refill gap is not more than 7-days?

I'm trying to create a new reference table that looks like this:

id       max_refill_gap_in_best_180_day_stretch       180_day_continuous_use_indicator 
1111                     6                                            1 
2222                     7                                            1
3333                     8                                            0

(This table doesn't correspond to the dataframe I created, above.)

The binary indicator should say 0 if patient id did not have 180 days of continuous treatment in a drug, and the binary indicator should say 1 if they did have 180 days of continuous treatment in a drug.








Updating here the question based on feedback from the comments:

I created a new dataframe. (The method here is a little bit inelegant.)

id <- rep(c("1111", "1111", "1111", "1111", "1111"), times = 73)

date <- seq(as.Date("2018-01-01"), as.Date("2018-12-31"), by="days")

# drug <- rep(c("Drug A", "Drug B", "Drug A", "Drug A", "Drug A"), times = 73)

day_supply <- rep(c(1, 2, 3, 4, 5), times = 73)

drug_table <- tibble(id, date, day_supply)

drug_table$date <- as.Date(drug_table$date)

drug_table <- drug_table %>%
 slice(which(row_number() %% 6 == 1))

drug_table <- drug_table %>%
  filter(date != "2018-01-13") %>%
  filter(date != "2018-01-19")

drug_table %>%
  mutate(drug_will_be_used_until = date + day_supply) %>%
  mutate(refill_gap_at_time_of_new_prescription_since_last_prescription = date - lag(drug_will_be_used_until)) -> drug_table


######

id <- rep(c("2222", "2222", "2222", "2222", "2222"), times = 73)

date <- seq(as.Date("2018-01-01"), as.Date("2018-12-31"), by="days")

# drug <- rep(c("Drug A", "Drug B", "Drug A", "Drug A", "Drug A"), times = 73)

day_supply <- rep(c(1, 2, 3, 4, 5), times = 73)

drug_table2 <- tibble(id, date, day_supply)

drug_table2$date <- as.Date(drug_table2$date)

drug_table2 <- drug_table2 %>%
 slice(which(row_number() %% 6 == 1))

drug_table2 <- drug_table2 %>%
  filter(date != "2018-01-13") %>%
  filter(date != "2018-01-19") %>%
  filter(date != "2018-02-12") %>%
  filter(date != "2018-02-18") %>%
  filter(date != "2018-03-14") %>%
  filter(date != "2018-03-20") %>%
  filter(date != "2018-04-13") %>%
  filter(date != "2018-04-19") %>%
  filter(date != "2018-05-13") %>%
  filter(date != "2018-05-19") %>%
  filter(date != "2018-06-12") %>%
  filter(date != "2018-06-18") %>%
  filter(date != "2018-06-12") %>%
  filter(date != "2018-06-18") %>%
  filter(date != "2018-07-12") %>%
  filter(date != "2018-07-18") %>%
  filter(date != "2018-07-12") %>%
  filter(date != "2018-07-18") %>%
  filter(date != "2018-08-11") %>%
  filter(date != "2018-08-17") %>%
  filter(date != "2018-09-10") %>%
  filter(date != "2018-09-16") %>%
  filter(date != "2018-10-10") %>%
  filter(date != "2018-10-16") %>%
  filter(date != "2018-11-09") %>%
  filter(date != "2018-11-15") 

drug_table2 %>%
  mutate(drug_will_be_used_until = date + day_supply) %>%
  mutate(refill_gap_at_time_of_new_prescription_since_last_prescription = date - lag(drug_will_be_used_until)) -> drug_table2


drug_table_df <- bind_rows(drug_table, 
                           drug_table2)

#####

id2 <- c("1111", "2222")
max_refill_gap_in_best_180_day_stretch <- c(5, 16)
the_180_day_continuous_use_indicator <- c(1, 1)

drug_table_output <- tibble(id2, max_refill_gap_in_best_180_day_stretch, the_180_day_continuous_use_indicator)

In the first case, I have the id "1111" with a dataframe that has one refill gap of 16 days early on. But starting at the end of January, any stretch of 180 days is associated with a maximum refill gap of 5 days.

In the second case, I have the id "2222" but less frequent visits mean that there's a refill gap of 16 days that occurs pretty much every month. So in any 180 day stretch, there's going to be a refill gap of 16 days.

enter image description here

So we can say with patient 1111, there exists a 180 day stretch where the maximum refill gap is under 7 days. But with patient 2222, there doesn't exist any 180 day period where the maximum refill gap is under 7 days.


Solution

  • It sounds like you're looking for the "minimum maximum" gap with 180 day windows, i.e. first finding the largest gap in each 180 day window, and then identifying the smallest such summary value across the whole data.

    Here's an approach with dplyr and slider:

    library(dplyr); library(slider)
    drug_table_df %>%
      group_by(id) %>%
      mutate(longest_streak_180 = slider::slide_index_dbl(
        as.numeric(refill_gap_at_time_of_new_prescription_since_last_prescription),
        date, ~max(.x, na.rm = TRUE), .before = days(179), .complete = TRUE)) %>% 
      filter(!is.na(longest_streak_180)) %>%
      summarize(best_180 = min(longest_streak_180[longest_streak_180>0])) %>%
      mutate(the_180_indicator = 1 * (best_180 < 7))
    
    
    # A tibble: 2 × 3
      id    best_180 the_180_indicator
      <chr>    <dbl>             <dbl>
    1 1111         5                 1
    2 2222        16                 0