rrolling-computationrollapplyrunner

Making rollApply() skip n steps - R


Below is my attempt at a minimal reproducible example. Briefly explained, I am using rollApply from the rowr package to calculate a function over a rolling window, and using data from two columns simultaneously. If possible, I would like to skip n steps between each time the function is calculated on a new window. I will try to make it clear what I mean in the example below.

Here is the example data:

df1 <- tibble(
  x = c(1:9), 
  y = c(1:9), 
  Date = as.Date(c("2015-08-08", "2015-08-15", "2015-08-22", 
                   "2015-08-29","2015-09-05", "2015-09-12", "2015-09-19", 
                   "2015-09-26", "2015-10-03"))
)

Here are the example functions:

calc_ex <- function(y){
  
  sum(y[,1] + y[,2])
}

roll_calc_ex <- function(y){
  
  vec <- c(rep(NA, 2), rowr::rollApply(y, calc_ex, window = 3, minimum = 3))
  
  y <- y %>%
    mutate(estimate = vec)
           

  return(y)
}

Applying the function roll_calc_ex() to df1, I get the following output:

> roll_calc_ex(df1)
# A tibble: 9 x 4
       x     y Date       estimate
   <int> <int> <date>        <int>
 1     1     1 2015-08-08       NA
 2     2     2 2015-08-15       NA
 3     3     3 2015-08-22       12
 4     4     4 2015-08-29       18
 5     5     5 2015-09-05       24
 6     6     6 2015-09-12       30
 7     7     7 2015-09-19       36
 8     8     8 2015-09-26       42
 9     9     9 2015-10-03       48

Ideally, I would to have a rolling window that skips n steps, say n=2, to produce the following output:

# A tibble: 9 x 4
       x     y Date       estimate
   <int> <int> <date>        <int>
 1     1     1 2015-08-08       NA
 2     2     2 2015-08-15       NA
 3     3     3 2015-08-22       12
 4     4     4 2015-08-29       NA
 5     5     5 2015-09-05       NA
 6     6     6 2015-09-12       30
 7     7     7 2015-09-19       NA
 8     8     8 2015-09-26       NA
 9     9     9 2015-10-03       48

Alternatively, instead of returning NA for every row skipped, the number from the previous calculation could be filled in (something I am planning to do later aynway using fill() from tidyverse).

If this is possible to solve using for example rollapply() from the zoo package, that would also be interesting to hear. I am only using rowr::rollApply() because I need to apply the function to two columns simultaneously. I know it is possible to use runner() from the package "runner", but in my more complicated problem I need to run parallel computations. I am using the furrr package for parallelization, and my code works well with rollApply, but not with runner(). The problem I have with runner is explained here: Problem with parallelization using furrr [and runner::runner() ] in R .

Thanks to anyone that took the time to read this post. Any help will be much appreciated.


Solution

  • 1) The rowr package was removed from CRAN but we can use rollapplyr (like rollapply but the r on the end means to default to right alignment) from zoo which has a by.column= argument to specify whether processing is performed column by column (TRUE) or all columns are passed at once (FALSE) and a by= argument which causes skipping.

    library(dplyr)
    library(zoo)
    
    mutate(df1, roll = 
      rollapplyr(cbind(x, y), 3, calc_ex, fill = NA, by.column = FALSE, by = 2)
    )
    

    giving:

      x y       Date roll
    1 1 1 2015-08-08   NA
    2 2 2 2015-08-15   NA
    3 3 3 2015-08-22   12
    4 4 4 2015-08-29   NA
    5 5 5 2015-09-05   24
    6 6 6 2015-09-12   NA
    7 7 7 2015-09-19   36
    8 8 8 2015-09-26   NA
    9 9 9 2015-10-03   48
    

    2) Using complex arithmetic would also work:

    f <- function(v) calc_ex(cbind(Re(v), Im(v)))
    mutate(df1, roll = rollapplyr(x + y * 1i, 3, f, fill = NA, by = 2))
    

    3) and if we look into call_ex then it could be written (although this does not generalize):

    mutate(df1, roll = rollapplyr(x + y, 3, sum, fill = NA, by = 2))
    

    4) We could also consider using zoo objects rather than data frames:

    z <- read.zoo(df1, index = "Date")
    merge(z, roll = rollapplyr(z, 3, calc_ex, by.column = FALSE, by = 2))