I ran a poison model over some of my count data (only one example is shown here). I tried to do a quadratic curve fit via model (graph 1, below) and a regular fit using an inbuilt function in ggplot2. I am not sure why they are so drastically different. I see this across several of my graphs (I tested to check if it was Poisson distributed). I am wondering if my predict function is doing something wonky?
library(investr)
library(ggplot2)
y.test <- c(3.09601,3.546579, 12.115740, 2.226694, 1.180938, 4.420249, 2.001162, 3.788012, 21.170732, 7.494421 , 5.602522 , 3.300510, 11.404264 ,23.115029,
19.371686, 25.444904, 17.094280 ,1.368615 ,19.343291 , 9.724363 , 8.086256 ,13.021972 ,10.740431 , 2.768960 ,14.494745 ,19.040086 , 7.072040, 8.748415,
10.012655, 14.759963 , 6.669221, 9.179184, 14.069743 ,12.132714, 8.517986, 18.095548, 9.076304, 9.197501, 7.972339 , 3.111373, 10.802117, 16.874861,
2.977454 ,15.195754, 5.433059 , 8.569472, 24.479745 , 3.756167 ,7.028482 , 7.412065 , 6.298529 , 3.585942 , 4.706638 , 9.002232, 5.276891)
x.test <- c(1:55)
df.test <- data.frame(x.test, y.test)
mod <- glm(y.test ~ x.test + I(x.test^2), data = df.test, family = poisson)
predicted.spp <- data.frame(predFit(mod, interval='confidence', level=.95))
df.test$predicted.mean <- predicted.spp$fit
df.test$predicted.upr <- predicted.spp$lwr
df.test$predicted.lwr <- predicted.spp$upr
ggplot(df.test, aes(x = x.test, y = y.test)) + geom_point() +
geom_line(aes(y=predicted.mean), colour="blue") +
geom_ribbon(aes(ymin=predicted.lwr, ymax=predicted.upr), alpha=0.8) +
stat_poly_eq(formula = my.formula, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), parse = TRUE, size = 2.5, col = "black")
my.formula = y~x + I(x^2)
ggplot(df.test, aes(x = x.test, y = y.test)) + geom_point() +
geom_smooth(method="lm", formula = my.formula, color = "black" ) +
stat_poly_eq(formula = my.formula, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), parse = TRUE, size = 2.5, col = "black")
By default predFit
(and most GLM prediction machinery) returns results on the link scale, in this case the log scale. In your glm
example you need predFit(mod, interval='confidence', type = "response" level=.95)
. (If you wanted you could also exponentiate the results predFit
gives you yourself.)
You could also use
geom_smooth(method="glm", formula = my.formula,
method.args = list(family = "poisson"), color = "black" )