rcovariance

R fastest bivariate regression slope coefficient


I have tried a number linear regressions, and though the standard ones are all good (e.g., lm.fit is really nicely fast), the fastLM in https://www.rdocumentation.org/packages/RcppEigen/versions/0.3.4.0.2/topics/fastLm gets the cake. It is truly amazingly fast. Thanks Doug, Dirk, Romain, and Yixuan.

alas, there is a note in its documentation that the special form of a bivariate regression could be done even faster. If I just need the slope, should I use the native cov(x,y)/var(x) in R, or should I write this in Rcpp, or ...?


Solution

  • Seems like .lm.fit wins for this case, although it could depend on the size of your data set ... I don't know if you could do even better with something Rcpp-ish — if you want to do lots of really small regressions various kinds of function-calling overheads are going to get important ... (fastR has a fast covariance calculator, but it says it is competitive in/intended for the high-dimensional case ...)

    simfun <- function(n = 100) {
       data.frame(y = rnorm(n), x = rnorm(n))
    }
    
    set.seed(101)
    dd <- simfun()
    
    mylm <- function(x, y) { v <- var(cbind(x,y)); v[2,1]/v[1,1] }
    mylm2  <- function(x, y) { cov(x,y) / var(x) }
    
    library(RcppEigen)
    
    with(dd, 
    bench::mark(
       lm.fit(cbind(1, x), y)$coefficients[2],
       .lm.fit(cbind(1, x), y)$coefficients[2],
       fastLmPure(cbind(1, x), y)$coefficients[2],
       mylm(x, y),
       mylm2(x, y),
      check = FALSE 
    )
    )
    
      expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
      <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
    1 lm.fit(cb… 24.68µs 26.45µs    36555.    7.34KB     14.6  9996     4    273.4ms
    2 .lm.fit(c…  5.86µs  6.67µs   144591.    4.88KB     14.5  9999     1     69.2ms
    3 fastLmPur… 12.85µs 13.99µs    70384.    3.27KB     14.1  9998     2      142ms
    4 mylm(x, y) 10.05µs 11.16µs    86832.    1.61KB     26.1  9997     3    115.1ms
    5 mylm2(x, … 22.68µs 24.32µs    40539.        0B     20.3  9995     5    246.6ms
    # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>