rrunnerrolling-sum

Running, Rolling Sum in a specified window in R


I want to determine the running total of a column within a specified date window.

id date value
7 2023-01-01 1
7 2023-01-03 1
7 2023-01-04 3
7 2023-01-05 2
7 2023-01-06 1
7 2023-01-07 5
7 2023-01-10 3
7 2023-01-14 2
15 2023-01-01 1
15 2023-01-02 1
15 2023-01-04 2
15 2023-01-07 2
15 2023-01-12 1
15 2023-01-13 1
15 2023-01-14 10

Given the table above, I need to determine the max rolling sum for any 7-day window for each id. (first window would 1/1 to 1/7, second 1/2-1/8, etc...)

In other words, for id 7, that's 14 for the 1/4-1/10 windows (3 + 2 + 1 + 5 + 3). For 15, that's 12 for the 1/8-1/14 window (1 + 1 + 10).

I was using the runner library, but I can't quite get what I should be getting.

Edit: This worked if the max is not in the first few windows:

library(runner)

df %>%
group_by(id) %>%
mutate(rsum = runner(value, k=7, idx=date, f=function(x) sum(x), na_pad=T)) %>%
filter(!is.na(rsum)) %>%
summrise(max_rsum = max(rsum))

Solution

  • rollapplyr takes as arguments the value, vector of widths (how many values to sum at each point) and the function to apply (sum). Run that for each group and then take the maximum.

    library(dplyr, exclude = c("filter", "lag"))
    library(zoo)
    
    df %>%
      mutate(date = as.Date(date)) %>% # can omit if already Date class
      group_by(id) %>%
      mutate(max = rollapplyr(value, 1:n() - findInterval(date - 7, date), sum)) %>%
      slice_max(max) %>%
      ungroup %>%
      select(-value)
    

    giving:

    # A tibble: 2 × 3
    # Groups:   id [2]
         id date         max
      <int> <date>     <int>
    1     7 2023-01-10    14
    2    15 2023-01-14    12
    

    Note

    The input in reproducible form

    df <- structure(list(id = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 15L, 15L, 
    15L, 15L, 15L, 15L, 15L), date = c("2023-01-01", "2023-01-03", 
    "2023-01-04", "2023-01-05", "2023-01-06", "2023-01-07", "2023-01-10", 
    "2023-01-14", "2023-01-01", "2023-01-02", "2023-01-04", "2023-01-07", 
    "2023-01-12", "2023-01-13", "2023-01-14"), value = c(1L, 1L, 
    3L, 2L, 1L, 5L, 3L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 10L)), 
    class = "data.frame", row.names = c(NA, -15L))