I want to calculate a sum over a sliding window on grouped data.
As I would like to stick to official functions if possible I started with rollapplyr like this:
library(tidyverse)
library(reshape2)
library(zoo)
data = data.frame(Count=seq(1,10,1),
group=c("A","B","A","A","B","B","B","B","A","A"))
window_size <- 3
data_rolling <- data %>%
arrange(group) %>%
group_by(group) %>%
mutate(Rolling_Count = rollapplyr(Count, width=window_size, FUN=sum, fill = NA)) %>%
ungroup()
for the first entries which are smaller than width (in this case 3) it gets filled with NA as defined, but I would actually like to have the sum of the possible data there like this:
Count group Rolling_Count expected_Result
1 A NA 1
3 A NA 4
4 A 8 8
9 A 16 16
10 A 23 23
2 B NA 2
5 B NA 7
6 B 13 13
7 B 18 18
8 B 21 21
I know that I can replace the width=window_size
with something like this:
c(rep(1:window_size,1),rep(window_size:window_size,(n()-window_size)))
to get what I want but this is really slow. In addition this approach would assume that n() is greater than window_size.
So: Is there already an R/zoo function which can handle grouped data like above and in addition data with less than window_size entries and is faster to the above approach?
Thanks for any hints!
A solution based on data.table
and RcppRoll
that should be much more performant.
It's not as clean as I would like -- there's actually a partial
argument in RcppRoll::roll_sum()
that hasn't been implemented yet that would theoretically solve this cleanly, but it doesn't seem like that will be worked anytime soon-- see GH Issue #18 .
At any rate, until someone implements a rolling sum in R that allows what you need here, adding in a cumsum
on the first n - 1
rows seems to be a sensible solution.
library(data.table)
library(RcppRoll)
data = data.frame(Count=seq(1,10,1),
group=c("A","B","A","A","B","B","B","B","A","A"))
## Convert to a `data.table` by reference
setDT(data)
window_size <- 3
## Add a counter row so that we can go back and fill in rows
## 1 & 2 of each group
data[,Group_RowNumber := seq_len(.N), keyby = .(group)]
## Do a rolling window -- this won't fill in the first 2 rows
data[,Rolling_Count := RcppRoll::roll_sum(Count,
n = window_size,
align = "right",
fill = NA), keyby = .(group)]
## Go back and fill in the ones we missed
data[Group_RowNumber < window_size, Rolling_Count := cumsum(Count), by = .(group)]
data
# Count group Group_RowNumber Rolling_Count
# 1: 1 A 1 1
# 2: 3 A 2 4
# 3: 4 A 3 8
# 4: 9 A 4 16
# 5: 10 A 5 23
# 6: 2 B 1 2
# 7: 5 B 2 7
# 8: 6 B 3 13
# 9: 7 B 4 18
# 10: 8 B 5 21