I have tried a number linear regressions, and though the standard ones are all good (e.g., lm.fit
is really nicely fast), the fastLM
in https://www.rdocumentation.org/packages/RcppEigen/versions/0.3.4.0.2/topics/fastLm gets the cake. It is truly amazingly fast. Thanks Doug, Dirk, Romain, and Yixuan.
alas, there is a note in its documentation that the special form of a bivariate regression could be done even faster. If I just need the slope, should I use the native cov(x,y)/var(x)
in R, or should I write this in Rcpp, or ...?
Seems like .lm.fit
wins for this case, although it could depend on the size of your data set ... I don't know if you could do even better with something Rcpp-ish — if you want to do lots of really small regressions various kinds of function-calling overheads are going to get important ... (fastR
has a fast covariance calculator, but it says it is competitive in/intended for the high-dimensional case ...)
simfun <- function(n = 100) {
data.frame(y = rnorm(n), x = rnorm(n))
}
set.seed(101)
dd <- simfun()
mylm <- function(x, y) { v <- var(cbind(x,y)); v[2,1]/v[1,1] }
mylm2 <- function(x, y) { cov(x,y) / var(x) }
library(RcppEigen)
with(dd,
bench::mark(
lm.fit(cbind(1, x), y)$coefficients[2],
.lm.fit(cbind(1, x), y)$coefficients[2],
fastLmPure(cbind(1, x), y)$coefficients[2],
mylm(x, y),
mylm2(x, y),
check = FALSE
)
)
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:t> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 lm.fit(cb… 24.68µs 26.45µs 36555. 7.34KB 14.6 9996 4 273.4ms
2 .lm.fit(c… 5.86µs 6.67µs 144591. 4.88KB 14.5 9999 1 69.2ms
3 fastLmPur… 12.85µs 13.99µs 70384. 3.27KB 14.1 9998 2 142ms
4 mylm(x, y) 10.05µs 11.16µs 86832. 1.61KB 26.1 9997 3 115.1ms
5 mylm2(x, … 22.68µs 24.32µs 40539. 0B 20.3 9995 5 246.6ms
# ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>