rregressionglmpoissonmodel-fitting

Resolving discrepancy between poisson glm fits and regular quadratic fit in R (ggplot2)?


I ran a poison model over some of my count data (only one example is shown here). I tried to do a quadratic curve fit via model (graph 1, below) and a regular fit using an inbuilt function in ggplot2. I am not sure why they are so drastically different. I see this across several of my graphs (I tested to check if it was Poisson distributed). I am wondering if my predict function is doing something wonky?

library(investr)
library(ggplot2)

y.test <- c(3.09601,3.546579, 12.115740,  2.226694,  1.180938,  4.420249,  2.001162,  3.788012, 21.170732,  7.494421 , 5.602522 , 3.300510, 11.404264 ,23.115029,
            19.371686, 25.444904, 17.094280  ,1.368615 ,19.343291 , 9.724363 , 8.086256 ,13.021972 ,10.740431 , 2.768960 ,14.494745 ,19.040086 , 7.072040,  8.748415,
            10.012655, 14.759963 , 6.669221,  9.179184, 14.069743 ,12.132714,  8.517986, 18.095548,  9.076304,  9.197501,  7.972339 , 3.111373, 10.802117, 16.874861,
            2.977454 ,15.195754,  5.433059 , 8.569472, 24.479745 , 3.756167  ,7.028482 , 7.412065 , 6.298529 , 3.585942 , 4.706638 , 9.002232,  5.276891)

x.test <- c(1:55)

df.test <- data.frame(x.test, y.test)

mod <- glm(y.test ~ x.test + I(x.test^2), data = df.test, family = poisson)

predicted.spp <- data.frame(predFit(mod, interval='confidence', level=.95))
df.test$predicted.mean <- predicted.spp$fit    
df.test$predicted.upr <-  predicted.spp$lwr
df.test$predicted.lwr <-  predicted.spp$upr

ggplot(df.test, aes(x = x.test, y = y.test)) + geom_point() +
  geom_line(aes(y=predicted.mean), colour="blue") + 
  geom_ribbon(aes(ymin=predicted.lwr, ymax=predicted.upr), alpha=0.8) + 
  stat_poly_eq(formula = my.formula, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),  parse = TRUE, size = 2.5, col = "black") 

enter image description here

my.formula = y~x + I(x^2)
ggplot(df.test, aes(x = x.test, y = y.test)) + geom_point() +
geom_smooth(method="lm", formula = my.formula, color = "black" ) + 
stat_poly_eq(formula = my.formula, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),  parse = TRUE, size = 2.5, col = "black") 

enter image description here


Solution

  • By default predFit (and most GLM prediction machinery) returns results on the link scale, in this case the log scale. In your glm example you need predFit(mod, interval='confidence', type = "response" level=.95). (If you wanted you could also exponentiate the results predFit gives you yourself.)

    You could also use

    geom_smooth(method="glm", formula = my.formula, 
              method.args = list(family = "poisson"), color = "black" )