rdplyrdata.tableoverlaprolling-sum

How do I calculate a rolling sum with variable levels of depletion in R?


I'm trying to use R to measure how many days supply of a prescription an individual already has on-hand when they make a refill, taking into account all previous prescriptions. For example, if I had this table...

  member rx_id  fill_date    to_date days_supply
1      A     1 2018-10-01 2018-10-02           2
2      B     1 2016-11-07 2016-11-10           4
3      B     2 2016-11-07 2016-12-04          28
4      B     3 2016-11-08 2016-11-09           2
5      B     4 2016-11-10 2016-12-03          24

I'd expect the following output

  member rx_id  fill_date    to_date days_supply_on_hand
1      A     1 2018-10-01 2018-10-02                   0
2      B     1 2016-11-07 2016-11-10                   0
3      B     2 2016-11-07 2016-12-04                   4
4      B     3 2016-11-08 2016-11-09                  30
5      B     4 2016-11-10 2016-12-03                  26

For member B, when the second script is filled on the same day as the first script, the individual already has 4 days worth of RX on hand. When the third script is filled, the individual has 3 days left from the first script and 27 left from the second (30 total). When the fourth script is filled, the third script is depleted, but there is 1 day left from the first script and 25 from the third script (26 total).

I know how to do rolling totals in both dplyr and data.table, but I can't figure out how to take into account variable levels of depletion based on previous records on an individual by individual basis. Below is code to remake the original table, thanks in advance for any suggestions!

structure(list(member = structure(c(1L, 2L, 2L, 2L, 2L), .Label = 
c("A", 
"B"), class = "factor"), rx_id = c(1, 1, 2, 3, 4), fill_date = 
structure(c(17805, 
17112, 17112, 17113, 17115), class = "Date"), to_date = 
structure(c(17806, 
17115, 17139, 17114, 17138), class = "Date"), days_supply = c(2, 
4, 28, 2, 24)), .Names = c("member", "rx_id", "fill_date", 
"to_date", 
"days_supply"), row.names = c(NA, -5L), class = "data.frame")

Solution

  • library(data.table)
    dt = as.data.table(your_df) # or setDT to convert in place
    
    # merge on relevant days, then compute sum of supply - days elapsed
    dt[dt, on = .(member, fill_date <= fill_date, to_date >= fill_date, rx_id < rx_id), by = .EACHI,
       sum(days_supply, na.rm = T) - sum(i.fill_date - x.fill_date, na.rm = T)]
    #   member  fill_date    to_date rx_id      V1
    #1:      A 2018-10-01 2018-10-01     1  0 days
    #2:      B 2016-11-07 2016-11-07     1  0 days
    #3:      B 2016-11-07 2016-11-07     2  4 days
    #4:      B 2016-11-08 2016-11-08     3 30 days
    #5:      B 2016-11-10 2016-11-10     4 26 days