I need to calculate variations of rolling averages by group (group_by) in R dplyr.
The variation is that I want to calculate "descending" rolling averages, which means a rolling average including the 1 to N, then 2 to N, then 3 to N, up to N to N data points within the column of each group.
A simple example:
the_df <- tibble(id=c(rep('x', 11),rep('y',11)), a=rep(10:0,2), b=rep(seq(100,0, ,by=-10),2))
the_df <- the_df %>%
group_by(id) %>%
mutate(lead_time = row_number(),
inverse_lead_time = 11-lead_time)
the_df %>%
arrange(id, (lead_time)) %>%
group_by(id) %>%
mutate_at(vars(a,b), list(run_avg = ~ zoo::rollmean(., lead_time, fill = NA, align='left')))
The align='left' is included so the rolling average runs to the end of the group.
But, I'm not getting the output I expect.
The output of the last statement is:
In this case, I expect the column "a_run_avg" to increase with lead_time.
To reiterate, what I want is at lead_time = 5 (for example) the running average to be the average of column a to include all values a between lead_time = 5 and lead_time = 10.
The rolling average for lead_time=6 would include all values of a between lead_time = 6 and lead_time = 10.
The value of a_run_avg when lead_time = 1 should be 5. Its the average of all values in the a column for the group.
The value of a_run_avg when lead_time = 10 should be 10. I.e., is a rolling mean of only one value.
How can the example code be modified to achieve my descending rolling average?
1) rollapply rollmean
does not accept vector widths but rollapply
does.
library(dplyr)
library(zoo)
the_df %>%
mutate(
across(
.cols = any_of(c("a", "b")),
.fns = ~ rollapply(.x, n():1, mean, align = "left"),
.names = "avg_{.col}"
), .by = id)
giving
# A tibble: 22 × 5
id a b avg_a avg_b
<chr> <int> <dbl> <dbl> <dbl>
1 x 10 100 5 50
2 x 9 90 4.5 45
3 x 8 80 4 40
4 x 7 70 3.5 35
5 x 6 60 3 30
6 x 5 50 2.5 25
7 x 4 40 2 20
8 x 3 30 1.5 15
9 x 2 20 1 10
10 x 1 10 0.5 5
# ℹ 12 more rows
2) cummean dplyr has a cummean function so this would also work
library(dplyr)
the_df %>%
mutate(
across(
.cols = any_of(c("a", "b")),
.fns = ~ rev(cummean(rev(.x))),
.names = "avg_{.col}"
), .by = id)
The input used
library(tibble)
the_df <- tibble(id=c(rep('x', 11),rep('y',11)),
a=rep(10:0,2), b=rep(seq(100,0, ,by=-10),2))