I am trying to find the solution of a linear model using optim. I could use lm
or even use matrix multiplication to find explicit solution. However, the end goal is to use different objective functions. Since I can't replicate the lm
solution, there is something wrong with my code and I cannot figure it out. My linear model does not have an intercept term.
Here is an example
RSS_f <- function(x){
f_hat=as.matrix(dat[,2:ncol(dat)])%*%(as.matrix(x))
error=dat[,1] - f_hat
return(sum(error^2))
}
x0=rep(0,6)
r<- optim(x0,RSS_f)
Here is the sample data I have been using
dat=structure(list(y = c(-0.015592301115793, -0.257703905025368,
0.132944891819098, 0.144585602249198, -0.0834638586276937, -0.132506252754974,
-0.0671452738850035, 0.017093324992248), x1 = c(0.0762815118005244,
0.0684222303897641, 0.0126545677701762, -0.0283951453807663,
0.0682085328963522, 0.030322515917991, -0.0288021662870674, 0.0213481986485853
), x2 = c(0.121096462766523, 0.173507656658378, 0.0830579904733009,
-0.0131531465094037, 0.14514027572694, -0.00235172620017821,
-0.116782685211159, 0.128253294500769), x3 = c(0.00271738492481077,
0.199849189671841, -0.149623094678353, -0.0375499460631867, 0.035188928292329,
0.0421848373709954, -0.212940391685557, -0.280765720194465),
x4 = c(-0.661111149054194, -1.62711136938006, -0.16384009438204,
1.24217835189415, -1.01181978837279, -0.968527314213574,
-0.271489340669151, -0.602168687364268), x5 = c(1.6840335520254,
0.0085051611667053, 0.803743907332177, -1.86248825400901,
1.67581124074128, -0.0919360291035565, -0.210122962144954,
1.96152949931911), x6 = c(0.251747455313378, -0.353511278663921,
-0.155085549001921, -0.376159415130184, -0.0614334625077317,
0.759475827597367, 0.10074325959879, 0.512321974431873)), row.names = 63:56, class = "data.frame")
I see one obvious problem here, and encountered several non-obvious problems as I moved forward.
Slightly reorganized objective function (the only important difference is cbind(1, ...)
to add an intercept column to the model matrix: point (1) above)
RSS_f <- function(beta, dat){
## inefficient, should compute these outside the loop, but ...
X <- cbind(1, as.matrix(dat[,-1]))
y <- dat[,1]
f_ha <- X %*% beta
return(sum((y - f_ha)^2))
}
x0 <- rep(0,7) ## need one additional element in the vector for the intercept
r <- optim(x0, RSS_f, dat=dat, control = list(maxit = 1e5))
Points 2 and 3 are related to @ZheyuanLi's answer (i.e., picking a good starting point will mitigate them).
Point (2): if we run this optim()
without the control(list(maxit = 1e5))
, it fails to converge — but silently. You have to look at r$convergence
, see that the value is 1 and not 0 (zero = "converged"), and then go look at ?optim
to see that r$convergence == 1
means "maximum number of iterations observed".
Point (3): if you look at the coefficients, you'll see that they still don't quite match lm
. This data set has 8 observations and 7 predictors, which makes the OLS fit pretty ambitious (although not impossible). I tried a few other options (e.g. method = "BFGS"
in optim()
), which didn't help much.
You can get an idea of the stability of the OLS problem by looking at the ratio of the largest to the smallest eigenvalue of crossprod(X)
(i.e., t(X) %*% X
); in this case it's a factor of about 1e5 (bad news).
X <- cbind(1, as.matrix(dat[,-1]))
eigen(crossprod(X))$values
[1] 2.034880e+01 6.082053e+00 2.730513e+00 9.748323e-01 8.148341e-02
[6] 1.358231e-02 9.186567e-05
Alternatively, compute the condition number: Matrix::condest(crossprod(X))
is 2.75e5 (again, a large value is bad).
For comparison, if I generate predictors randomly (m <- matrix(rnorm(48), nrow = 8)
), add an intercept column, and compute the condition number of crossprod(m)
, the values are typically on the order of 500 (1000 random replicates ranged from about 20 to 30,000). So part of the problem is 'not much data for a model with this many predictors', but your example is even worse than a randomly sampled one.
R's built-in method for fitting linear models (QR with a custom pivoting strategy) works robustly even on such ill-posed problems.
I don't think removing the intercept will change the ill-conditioning problem very much, although I haven't checked.