rglmpoissonsjplot

Account for offset variables in plot_model() of sjPlot package


I am creating a Poisson-distributed generalized linear model using an offset variable (in my case, the number of seconds of roe deers displaying vigilant behaviors as the response variable and the total number of seconds stayed at the plot as the offset).

This is a reproducible sample using the offset in a similar way as my case, modified from Zuur's Mixed Effect Models for Ecology:

library(glmmTMB)
Owls$NCalls <- Owls$SiblingNegotiation

library(lme4)
Formula <- formula(NCalls ∼ offset(logBroodSize) + SexParent * FoodTreatment + SexParent * ArrivalTime)
fit1 <- glm(Formula, data = Owls, family = poisson(link="log"))

(sorry for the awkward shift between packages...I am more used to lme4 so decide to plot in lme4 while using the dataset from glmmTMB)

So now, I want to plot the results using the plot_model() function in the sjPlot package:

library(sjPlot)
plot_model(fit1, type="pred")

However, the plotted result does not account for the logBroodSize variable, and instead plots the offset separately. In order to make the results more meaningful, I want to have the y-axis as NCalls/LBroodSize for each regression plot.

This is really useful in my case as I want to eventually represent the vigilant behavior of roe deers as a proportion (number of seconds vigilant/total time occurring).

Is there anyway to do so?

I've tried directly adding the offset variable to the response, according to an answer I saw before (which I couldn't find the source anymore), but it did not work, still turning on the warning message that the values are non-integer.

Formula.2 <- formula(NCalls/logBroodSize ∼ SexParent * FoodTreatment + SexParent * ArrivalTime)
fit2 <- glm(Formula.2, data = Owls, family = poisson(link="log"))

I also thought about transforming my dataset into a gaussian distribution. However, since my data contains a lot of 0 values, data transformation may not be ideal.

Thank you!


Solution

  • Offsets are not statistical "variables". They are handled as known/certain/fixed values and do not get the same statistical treatment as random variables. In a very sense having a log-offset on the RHS is the same as having the BroodSize as a denominator on the LHS. In both cases there is no estimation of the BroodSize but it is rather taken as known-without-error.

    I just tried my suggestion of a quasipoisson model and it appears to allow the plotting with a dependent variable on the proper scale. I also changed the way that the formula object was created, although I personally try to leave the formula creation inside the glm call so that the environments get established correctly:

    Formula <- NCalls/BroodSize ~ # tilde operator is an infix version of `formula`.
                     SexParent * FoodTreatment + SexParent * ArrivalTime
    fit1 <- glm(Formula, data = Owls, family = quasipoisson(link="log"))
    plot_model(fit1, type="pred")
    #$SexParent
    #
    #$FoodTreatment
    #
    #$ArrivalTime
    
    png();  plot_model(fit1, type="pred")$ArrivalTime ; dev.off()
    

    enter image description here