I want to determine the running total of a column within a specified date window.
id | date | value |
---|---|---|
7 | 2023-01-01 | 1 |
7 | 2023-01-03 | 1 |
7 | 2023-01-04 | 3 |
7 | 2023-01-05 | 2 |
7 | 2023-01-06 | 1 |
7 | 2023-01-07 | 5 |
7 | 2023-01-10 | 3 |
7 | 2023-01-14 | 2 |
15 | 2023-01-01 | 1 |
15 | 2023-01-02 | 1 |
15 | 2023-01-04 | 2 |
15 | 2023-01-07 | 2 |
15 | 2023-01-12 | 1 |
15 | 2023-01-13 | 1 |
15 | 2023-01-14 | 10 |
Given the table above, I need to determine the max rolling sum for any 7-day window for each id. (first window would 1/1 to 1/7, second 1/2-1/8, etc...)
In other words, for id 7, that's 14 for the 1/4-1/10 windows (3 + 2 + 1 + 5 + 3). For 15, that's 12 for the 1/8-1/14 window (1 + 1 + 10).
I was using the runner library, but I can't quite get what I should be getting.
Edit: This worked if the max is not in the first few windows:
library(runner)
df %>%
group_by(id) %>%
mutate(rsum = runner(value, k=7, idx=date, f=function(x) sum(x), na_pad=T)) %>%
filter(!is.na(rsum)) %>%
summrise(max_rsum = max(rsum))
rollapplyr
takes as arguments the value, vector of widths (how many values to sum at each point) and the function to apply (sum
). Run that for each group and then take the maximum.
library(dplyr, exclude = c("filter", "lag"))
library(zoo)
df %>%
mutate(date = as.Date(date)) %>% # can omit if already Date class
group_by(id) %>%
mutate(max = rollapplyr(value, 1:n() - findInterval(date - 7, date), sum)) %>%
slice_max(max) %>%
ungroup %>%
select(-value)
giving:
# A tibble: 2 × 3
# Groups: id [2]
id date max
<int> <date> <int>
1 7 2023-01-10 14
2 15 2023-01-14 12
The input in reproducible form
df <- structure(list(id = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 15L, 15L,
15L, 15L, 15L, 15L, 15L), date = c("2023-01-01", "2023-01-03",
"2023-01-04", "2023-01-05", "2023-01-06", "2023-01-07", "2023-01-10",
"2023-01-14", "2023-01-01", "2023-01-02", "2023-01-04", "2023-01-07",
"2023-01-12", "2023-01-13", "2023-01-14"), value = c(1L, 1L,
3L, 2L, 1L, 5L, 3L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 10L)),
class = "data.frame", row.names = c(NA, -15L))