rlinear-regressionmsegamma-distribution

Two Ways of Computing MSE for Gamma Regression Model in R Do Not Agree


I've seen a couple of questions asking about computing MSE in R here and here, but they do not answer my question. I'm trying a glm with the Gamma distribution (so, Gamma regression), and trying to compute the MSE for the model two different ways. Here's a MWE:

set.seed(39756934)                           
x = rnorm(100)
y = rnorm(100) + 0.5 * x + 3  # Need the +3 for strictly positive y values.
my_data = data.frame(x, y)
my_mod = glm(y~x, data=my_data, family=Gamma())  # Default link is "inverse"
mean(my_mod$residuals^2)
    0.01388825
mean((my_data$y - predict(my_mod))^2)
    6.954654

Using the log link gives different numbers, but the two methods still do not agree.

As far as I know, these two methods should give the same results, but they're obviously not. What's going on? Is there a bug somewhere? Or is one of these methods of computing the MSE simply wrong for Gamma regression?

I don't think this question belongs on CrossValidated.SE, because it's more of a code/programming thing - unless the answer is that one of these methods of computing the MSE is invalid for a Gamma regression. But I don't know that in advance. This question would certainly be closed on CrossValidated.SE.


Solution

  • Specify type = "response" because one could either consider the residuals and predictions on the linear predictor or the response scale.

    mean(resid(my_mod, type = "response")^2)
    ## [1] 0.7419272
    
    mean((my_data$y - predict(my_mod, type = "response"))^2)
    ## [1] 0.7419272
    

    There are several different possibilities for type= as shown here but only "response" is available in both

    p.type <- c("link", "response", "terms")
    r.type <- c("deviance", "pearson", "working", "response", "partial")
     
    sapply(r.type, \(x) mean(resid(my_mod, type = x)^2))
    ##   deviance    pearson    working   response    partial 
    ## 0.10746632 0.09651097 0.01388825 0.74192724 0.01685553 
    
    
    sapply(p.type, \(x) mean((my_data$y - predict(my_mod, type = x))^2))
    ##      link  response     terms 
    ## 6.9546540 0.7419272 8.8680964