rregressionzoorolling-computationrollapply

Rolling regression across multiple columns with NAs in R


I want to regress each column of y against x with width 12 and calculate Beta, Intercept, and R-2 values. I want a minimum of 12 observations and for cases where there isn't overlapping data, for these to be NAs. For my output, I want three matrices (dataframes) with column headers y1,y2,y3, containing all Beta values, all Intercept values, and all R-2 values respectively. The output matrices should each be of size 3x53.

In practice, my y-matrix has 2000+ columns, so I'd like to avoid looping if possible.

Example:
y <- data.frame(y1 = c(NA,NA,NA, rnorm(50)),
                y2 = c(NA, NA, NA, NA, NA, rnorm(48)),
                y3 = rnorm(53))
x <- data.frame(x = c(NA, NA,NA, NA, NA, rnorm(48)))

my attempt using rollapply:

model_output <- rollapply(data = cbind.data.frame(y, x),
                          width = 12, FUN = function(a,b) {lm(a ~ b)} ,
                          by.column = FALSE, align = "right",
                          fill = NA)

Solution

  • Using the data in the Note at the end from the question define a roll function which performs the linear regression and then applies a passed in function fun. Then use it to define Intercept, Beta and R2.

    library (zoo)
    
    fm <- lm(as.matrix(y) ~ x, x)
    
    roll <- function(fun) rollapplyr(1:nrow(y), 12, FUN = 
       \(ix) fun(update(fm, subset = ix)), fill = NA, by.column = FALSE) 
    
    Intercept <- roll(\(x) coef(x)[1, ])
    Beta <- roll(\(x) coef(x)[2, ])
    R2 <- roll(\(x) sapply(summary(x), function(z) z$r.squared))
    

    Note

    y <- data.frame(y1 = c(NA,NA,NA, rnorm(50)),
                    y2 = c(NA, NA, NA, NA, NA, rnorm(48)),
                    y3 = rnorm(53))
    x <- data.frame(x = c(NA, NA,NA, NA, NA, rnorm(48)))