Below is my attempt at a minimal reproducible example. Briefly explained, I am using rollApply from the rowr package to calculate a function over a rolling window, and using data from two columns simultaneously. If possible, I would like to skip n steps between each time the function is calculated on a new window. I will try to make it clear what I mean in the example below.
Here is the example data:
df1 <- tibble(
x = c(1:9),
y = c(1:9),
Date = as.Date(c("2015-08-08", "2015-08-15", "2015-08-22",
"2015-08-29","2015-09-05", "2015-09-12", "2015-09-19",
"2015-09-26", "2015-10-03"))
)
Here are the example functions:
calc_ex <- function(y){
sum(y[,1] + y[,2])
}
roll_calc_ex <- function(y){
vec <- c(rep(NA, 2), rowr::rollApply(y, calc_ex, window = 3, minimum = 3))
y <- y %>%
mutate(estimate = vec)
return(y)
}
Applying the function roll_calc_ex() to df1, I get the following output:
> roll_calc_ex(df1)
# A tibble: 9 x 4
x y Date estimate
<int> <int> <date> <int>
1 1 1 2015-08-08 NA
2 2 2 2015-08-15 NA
3 3 3 2015-08-22 12
4 4 4 2015-08-29 18
5 5 5 2015-09-05 24
6 6 6 2015-09-12 30
7 7 7 2015-09-19 36
8 8 8 2015-09-26 42
9 9 9 2015-10-03 48
Ideally, I would to have a rolling window that skips n steps, say n=2, to produce the following output:
# A tibble: 9 x 4
x y Date estimate
<int> <int> <date> <int>
1 1 1 2015-08-08 NA
2 2 2 2015-08-15 NA
3 3 3 2015-08-22 12
4 4 4 2015-08-29 NA
5 5 5 2015-09-05 NA
6 6 6 2015-09-12 30
7 7 7 2015-09-19 NA
8 8 8 2015-09-26 NA
9 9 9 2015-10-03 48
Alternatively, instead of returning NA for every row skipped, the number from the previous calculation could be filled in (something I am planning to do later aynway using fill() from tidyverse).
If this is possible to solve using for example rollapply() from the zoo package, that would also be interesting to hear. I am only using rowr::rollApply() because I need to apply the function to two columns simultaneously. I know it is possible to use runner() from the package "runner", but in my more complicated problem I need to run parallel computations. I am using the furrr package for parallelization, and my code works well with rollApply, but not with runner(). The problem I have with runner is explained here: Problem with parallelization using furrr [and runner::runner() ] in R .
Thanks to anyone that took the time to read this post. Any help will be much appreciated.
1) The rowr package was removed from CRAN but we can use rollapplyr
(like rollapply
but the r
on the end means to default to right alignment) from zoo which has a by.column=
argument to specify whether processing is performed column by column (TRUE) or all columns are passed at once (FALSE) and a by=
argument which causes skipping.
library(dplyr)
library(zoo)
mutate(df1, roll =
rollapplyr(cbind(x, y), 3, calc_ex, fill = NA, by.column = FALSE, by = 2)
)
giving:
x y Date roll
1 1 1 2015-08-08 NA
2 2 2 2015-08-15 NA
3 3 3 2015-08-22 12
4 4 4 2015-08-29 NA
5 5 5 2015-09-05 24
6 6 6 2015-09-12 NA
7 7 7 2015-09-19 36
8 8 8 2015-09-26 NA
9 9 9 2015-10-03 48
2) Using complex arithmetic would also work:
f <- function(v) calc_ex(cbind(Re(v), Im(v)))
mutate(df1, roll = rollapplyr(x + y * 1i, 3, f, fill = NA, by = 2))
3) and if we look into call_ex then it could be written (although this does not generalize):
mutate(df1, roll = rollapplyr(x + y, 3, sum, fill = NA, by = 2))
4) We could also consider using zoo objects rather than data frames:
z <- read.zoo(df1, index = "Date")
merge(z, roll = rollapplyr(z, 3, calc_ex, by.column = FALSE, by = 2))