I've seen a couple of questions asking about computing MSE in R here and here, but they do not answer my question. I'm trying a glm
with the Gamma distribution (so, Gamma regression), and trying to compute the MSE for the model two different ways. Here's a MWE:
set.seed(39756934)
x = rnorm(100)
y = rnorm(100) + 0.5 * x + 3 # Need the +3 for strictly positive y values.
my_data = data.frame(x, y)
my_mod = glm(y~x, data=my_data, family=Gamma()) # Default link is "inverse"
mean(my_mod$residuals^2)
0.01388825
mean((my_data$y - predict(my_mod))^2)
6.954654
Using the log link gives different numbers, but the two methods still do not agree.
As far as I know, these two methods should give the same results, but they're obviously not. What's going on? Is there a bug somewhere? Or is one of these methods of computing the MSE simply wrong for Gamma regression?
I don't think this question belongs on CrossValidated.SE, because it's more of a code/programming thing - unless the answer is that one of these methods of computing the MSE is invalid for a Gamma regression. But I don't know that in advance. This question would certainly be closed on CrossValidated.SE.
Specify type = "response" because one could either consider the residuals and predictions on the linear predictor or the response scale.
mean(resid(my_mod, type = "response")^2)
## [1] 0.7419272
mean((my_data$y - predict(my_mod, type = "response"))^2)
## [1] 0.7419272
There are several different possibilities for type=
as shown here but only "response" is available in both
p.type <- c("link", "response", "terms")
r.type <- c("deviance", "pearson", "working", "response", "partial")
sapply(r.type, \(x) mean(resid(my_mod, type = x)^2))
## deviance pearson working response partial
## 0.10746632 0.09651097 0.01388825 0.74192724 0.01685553
sapply(p.type, \(x) mean((my_data$y - predict(my_mod, type = x))^2))
## link response terms
## 6.9546540 0.7419272 8.8680964