I am running a simple multivariate regression on a panel/time-series dataset, using lm()
and the underlying formula $(X'X)^{-1} X'Y$
I'm expecting to get the same coefficient values from the two methods. However, I get completely different estimates.
Here is the R code:
return = matrix(ret.ff.zoo, ncol = 50) # y vector
data = cbind(df$EQ, df$EFF, df$SIZE, df$MOM, df$MSCR, df$SY, df$UMP) # x vector
#First method
BETA = solve(crossprod(data)) %*% crossprod(data, return)
#Second method
OLS <- lm(return ~ data)
I am not sure why the estimates are different between the two methods.
Your example isn't reproducible, but if you try it with some dummy data, the matrix formula and lm
produce the same results when you take out the intercept:
set.seed(1)
x <- matrix(rnorm(1000),ncol=5)
y <- rnorm(200)
solve(t(x) %*% x) %*% t(x) %*% y
[,1]
[1,] -0.0826496646
[2,] -0.0165735273
[3,] -0.0009412659
[4,] 0.0070475728
[5,] -0.0642452777
> lm(y ~ x + 0)
Call:
lm(formula = y ~ x + 0)
Coefficients:
x1 x2 x3 x4 x5
-0.0826497 -0.0165735 -0.0009413 0.0070476 -0.0642453