rnon-linear-regressionpoissoncoefficientsexp

Resurrecting coefficients from simulated data in Poisson regression


I am trying to understand how to resurrect model estimates from simulated data in a poisson regressions. There are other similar posts on interpreting coefficients on StackExchange/CrossValidated (https://stats.stackexchange.com/questions/11096/how-to-interpret-coefficients-in-a-poisson-regression, https://stats.stackexchange.com/questions/128926/how-to-interpret-parameter-estimates-in-poisson-glm-results), but I think my question is different (although admittedly related). I am trying to resurrect known relationships in order to understand what is happening with the model. I am posting here instead of CrossValidated because I am thinking that it is less of statistical interpretation and more of how I would get a known / simulated relationship back via code.

Here are some simulated data y and x with known relationships to some response resp

set.seed(707)
x<-rnorm(10000,mean=5,sd=1)
y<-rnorm(10000,mean=5,sd=1)        
resp<-(0.5*x+0.7*y-0.1*x*y) # where I define some relationships

With a linear regression, it is very straight forward:

summary(lm(resp~y+x+y:x))

The output shows the exact linear relationship between x, y, and the interaction.

Coefficients:
              Estimate Std. Error    t value Pr(>|t|)    
(Intercept)  1.592e-14  1.927e-15  8.260e+00   <2e-16 ***
y            7.000e-01  3.795e-16  1.845e+15   <2e-16 ***
x            5.000e-01  3.800e-16  1.316e+15   <2e-16 ***
y:x         -1.000e-01  7.489e-17 -1.335e+15   <2e-16 ***

Now, if I am interested in a poisson regression, I need integers, I just round, but keep the relationship between predictors and response:

resp<-round((0.5*x+0.7*y-0.1*x*y),0)
glm1<-glm(resp~y+x+y:x,family=poisson())
summary(glm1)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.419925   0.138906   3.023   0.0025 ** 
y            0.163919   0.026646   6.152 7.66e-10 ***
x            0.056689   0.027375   2.071   0.0384 *  
y:x         -0.011020   0.005261  -2.095   0.0362 *  

It is my understanding that one needs to exponentiate the results to understand them, because of the link function. But here, neither the exponentiated estimate nor intercept + estimate get me back to the original values.

> exp(0.419925+0.163919)
[1] 1.792917
> exp(0.163919)
[1] 1.178119

How do I interpret these values as related to the original 0.7*y relationship?

Now if I put that same linear equation into the exponential function, I get the values directly - no need to use exp():

resp<-round(exp(0.5*x+0.7*y-0.1*x*y),0)
summary(glm(resp~y+x+y:x,family=poisson()))
Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.002970   0.045422   0.065    0.948    
y            0.699539   0.008542  81.894   <2e-16 ***
x            0.499476   0.008912  56.047   <2e-16 ***
y:x         -0.099922   0.001690 -59.121   <2e-16 ***

Can someone explain to me what I am misinterpreting here, and how I might find the original values of a known relationship without first using the exp() function, as above?


Solution

  • You're neglecting the fact that the Poisson GLM uses a log link (exponential inverse-link) by default (or rather, you're not using that information consistently). You should either generate your 'data' with an exponential inverse-link:

    resp <- round(exp(0.5*x+0.7*y-0.1*x*y))
    

    or fit the model with an identity link (family=poisson(link="identity")). (I wouldn't recommend the latter, as it is rarely a sensible model.)

    For what it's worth, it's harder to simulate Poisson data that will exactly match a specified set of parameters, because (unlike the Gaussian where you can reduce the variance to arbitrarily small values) you can't generate real Poisson data with arbitrarily little noise. (Your round() statement produces integers, but not Poisson-distributed outcomes.)