rdplyrdata.tabletibbletime

Rolling sum within group boundaries


I'm trying to calculate a rolling sum based on the following toydata:

structure(list(Tag = c("1", "1", "1",  "1", "2", "2", "2", "2",  "2",
"2"), ID = c("A", "A", "A",  "B", "J", "J", "J", "A", "A", "A" ),
correctvis = c(1, 0, 1, 1, 1, 0, 1, 0, 1, 0)), row.names = c(NA, 
-10L), groups = structure(list(ID = "A", Tag = "1", 
.rows = structure(list(1:10), ptype = integer(0), class = c("vctrs_list_of", 
 "vctrs_vctr", "list"))), row.names = 1L, class = c("tbl_df",  "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
 "tbl_df", "tbl", "data.frame"))

I have already done so using both data.table and dplyr:

test <- as.data.table(df)[,sums2 := frollsum(correctvis, 7), by = c("ID","Tag")]

test <- df %>%
  group_by(ID, Tag) %>%
  mutate(sums = roll_sum(correctvis, 7, align = "right", fill = NA)) 

Both result in Identical results.

Note: In reality my Tag/ID lists continue for much much longer, which is why I use a window of 7 instead of something smaller.

The problem: Despite using group_by and by= the window that is used by roll_sum and froll_sum exceeds the boundaries of the groups. That is: I want to start counting as if all values correctvis before each grouping is 0 (for that grouping). The following code DOES seem to respect groupings (based on the tibbletime package):

rolling_sum <-  rollify(.f = sum, window = 7)
df <- df %>%
 group_by(ID, Tag) %>%
 mutate(sums2 = rolling_sum(correctvis))

However, this code will not work, as in some cases I have fewer than 7 observations per a certain grouping, resulting in the error:

Cannot roll apply with a window larger than the length of the data

My question:

Or

But I cannot get this to work.


Solution

  • Here is an option using data.table::frollmean:

    library(data.table)
    k <- 7L
    setDT(df)[, if (.N > k) frollmean(correctvis, c(1L:k, rep(k, .N - k)), adaptive=TRUE) 
            else frollmean(correctvis, seq_len(.N), adaptive=TRUE), 
        .(ID, Tag, rleid(ID, Tag))]
    

    output:

        ID Tag rleid        V1
     1:  A   1     1 1.0000000
     2:  A   1     1 0.5000000
     3:  A   1     1 0.6666667
     4:  B   1     2 1.0000000
     5:  J   2     3 1.0000000
     6:  J   2     3 0.5000000
     7:  J   2     3 0.6666667
     8:  A   2     4 0.0000000
     9:  A   2     4 0.5000000
    10:  A   2     4 0.3333333