rstatisticsglmgaussiangamma

Manually get the responses from GLM with gamma distribution and a GLM with inverse guassian distribution


I've been trying to manually get the response values given by the predict.glm function from the stats package in R. However, I'm unable to do so. I only know how to manually get the value with a binomial distribution. I would really appreciate some help. I created two small models (one with Gamma family and one with inverse Gaussian family).

library(stats)
library(dplyr)

data("USArrests")

#Gamma distribution model
model_gam <- glm(Rape~Murder + Assault + UrbanPop, data=USArrests, family=Gamma)

print(summary(model_gam))
responses_gam <- model_gam %>% predict(USArrests[1,], type="response")
print(responses_gam)

#Trying to manually get responses for gamma model
paste(coef(model_gam), names(coef(model_gam)), sep="*", collapse="+")
# "0.108221470842499*(Intercept)+-0.00122165587689519*Murder+-9.47425665022909e-05*Assault+-0.000467789606041651*UrbanPop"
print(USArrests[1,])
#Murder: 13.2, Assault: 236, UrbanPop: 58
x = 0.108221470842499 - 0.00122165587689519 * 13.2 - 9.47425665022909e-05 * 236 - 0.000467789606041651 * 58
# This is wrong. Do I have to include the dispersion? (which is 0.10609)
print (exp(x)/(1+exp(x)))
# result should be (from predict function): 26.02872 
# exp(x)/(1+exp(x)) gives: 0.510649


# Gaussian distribution model

model_gaus <- glm(Rape~Murder + Assault + UrbanPop, data=USArrests, family=inverse.gaussian(link="log"))

responses_gaus <- model_gaus %>% predict(USArrests[1,], type="response")
print(summary(model_gaus))
print(responses_gaus)

#Trying to manually get responses for gaussian model
paste(coef(model_gaus), names(coef(model_gaus)), sep="*", collapse="+")
# "0.108221470842499*(Intercept)+-0.00122165587689519*Murder+-9.47425665022909e-05*Assault+-0.000467789606041651*UrbanPop"
x =  1.70049202188329-0.0326196928618521* 13.2 -0.00234379099421488*236-0.00991369000675323*58
# Dispersion in this case is 0.004390825
print(exp(x)/(1+exp(x)))
# result should be (from predict function): 26.02872 
# exp(x)/(1+exp(x)) it is: 0.5353866

Solution

  • built-in predict()

    predict(model_gaus)["Alabama"] ## 3.259201
    

    by hand

    cat(paste(round(coef(model_gaus),5), names(coef(model_gaus)), sep="*", collapse="+"),"\n")
    ## 1.70049*(Intercept)+0.03262*Murder+0.00234*Assault+0.00991*UrbanPop
    USArrests["Albama",]
    ##         Murder Assault UrbanPop Rape
    ## Alabama   13.2     236       58 21.2
    

    The value of the intercept is always 1, so we have

    1.70049*1+0.03262*13.2+0.00234*236+0.00991*58 
    ## [1] 3.258094
    

    (close enough, since I rounded some things)

    You don't need to do anything with the dispersion or the inverse-link function, since the Gaussian model uses an identity link.

    using the model matrix

    Mathematically, the regression equation is defined as X %*% beta where beta is the vector of coefficients and X is the model matrix (for your example, it's a column of ones for the intercept plus your predictors; for models with categorical predictors or more complex terms like splines, it's a little more complicated). You can extract the model matrix from the matrix with model.matrix():

    Xg <- model.matrix(model_gaus)
    drop(Xg["Alabama",] %*% coef(model_gaus))
    

    For the Gamma model, you would use exactly the same procedure, but at the end you would transform the linear expression you computed (the linear predictor) by 1/x (the inverse link function for the Gamma). (Note that you need predict(..., type = "response") to get the inverse-transformed prediction; otherwise [default type = "link"] R will give you just the plain linear expression.] If you used a log link instead you would exponentiate. More generally,

    invlinkfun <- family(fitted_model)$linkinv
    X <- model.matrix(fitted_model)
    beta <- coef(fitted_model)
    invlinkfun(X %*% beta)
    

    The inverse Gaussian model uses a 1/mu^2 link by default; inverse.gaussian()$linkinv is function(eta) { 1/sqrt(eta) }