rregressionvisualizationlme4predict

Visualising GLMM predictions with interaction of categorical and continuous variables


I am working in R with a GLMM with a mixture of continuous and categorical variables with some interactions. I have used the dredge and model.avg functions in MuMIn to obtain effect estimates for each variable. My problem is in how best to plot the results. I want to make a figure showing the effect of one variable (forest) on my data where the trendline reflects the forest parameter estimate, but I can't figure out how to hold the categorical variables and interaction variables at their 'average' so that the trendline only reflects the effect of forest.

Here's the model and plot set-up:

#load packages and document
cuckoo<-read.table("http://www.acsu.buffalo.edu/~ciaranwi/home_range.txt", 
header=T,sep="\t")
require(lme4)
require(MuMIn)
as.factor (cuckoo$ID)
as.factor (cuckoo$Sex)
as.factor(cuckoo$MS_bin)
options(na.action = "na.fail")

# create global model and fit
fm<- lmer(log(KD_95)~ MS_bin + Forest + NDVI + Sex + Precip + MS_bin*Forest 
+ MS_bin*NDVI  + MS_bin*Sex + MS_bin*Precip + Argos + Sample + (1|ID), data 
= cuckoo, REML = FALSE)

# dredge but always include argos and sample
KD95<-dredge(fm,fixed=c("Argos","Sample"))

# model averaging 
avgmod<-model.avg(KD95, fit=TRUE)
summary(avgmod)

#plot data
plot(cuckoo$Forest, (log(cuckoo$KD_95)),
 xlab = "Mean percentage of forest cover",
 ylab = expression(paste(plain("Log of Kernel density estimate, 95%    
utilisation, km"^{2}))),
 pch = c(15,17)[as.numeric(cuckoo$MS_bin)],  
 main = "Forest cover",
 col="black", 
 ylim=c(14,23))
legend(80,22, c("Breeding","Nonbreeding"), pch=c(15, 17),  cex=0.7)

Then I get stuck with how to include a trendline. So far I have:

#parameter estimates from model.avg
argos_est<- -1.6
MS_est<- -1.77
samp_est<-0.01
forest_est<--0.02
sex_est<-0.0653
precip_est<-0.0004
ndvi_est<--0.00003
model_intercept<-22.7

#calculate mean values for parameters
argos_mean<-mean(cuckoo$Argos)
samp_mean<-mean(cuckoo$Sample)
forest_mean<-mean(cuckoo$Forest)
ndvi_mean<-mean(cuckoo$NDVI)
precip_mean<-mean(cuckoo$Precip)

#calculate the intercept and add trend line
intercept<-(model_intercept + (forest_est*cuckoo$Forest) +    
(argos_est*argos_mean) + (samp_est * samp_mean) + (ndvi_est*ndvi_mean) +  
(precip_est*precip_mean) )

abline(intercept, forest_est)

But this doesn't consider the interactions or the categorical variables and the intercept looks way too high. Any ideas?


Solution

  • In terms of process, you can make your coding much easier by taking advantage of the fact that R stores lots of information about the model in the model object and has functions to get information out of the model object. For example, coef(avgmod) will give you the model coefficients and predict(avgmod) will give you the model's predictions for each observation in the data frame you used to fit the model.

    To visualize predictions for specific combinations of data values we're interested in, create a new data frame that has the means of the variables we want to hold constant, along with a range of values for variables that we want to vary (like Forest). expand.grid creates a data frame with all combinations of the values listed below.

    pred.data = expand.grid(Argos=mean(cuckoo$Argos), Sample=mean(cuckoo$Sample), 
                            Precip=mean(cuckoo$Precip), NDVI=mean(cuckoo$NDVI), 
                            Sex="M", Forest=seq(0,100,10), MS_bin=unique(cuckoo$MS_bin), 
                            ID=unique(cuckoo$ID))
    

    Now we use the predict function to add predictions for log(KD_95) to this data frame. predict takes care of calculating the model predictions for whatever data you feed it (assuming you give it a data frame that includes all the variables in your model).

    pred.data$lgKD_95_pred = predict(avgmod, newdata=pred.data)
    

    Now we plot the results. geom_point plots the points, as in your original plot, then geom_line adds the predictions for each level of MS_bin (and Sex="M").

    library(ggplot2)
    
    ggplot() +
      geom_point(data=cuckoo, aes(Forest, log(KD_95), shape=factor(MS_bin), 
                 colour=factor(MS_bin), size=3)) +
      geom_line(data=pred.data, aes(Forest, lKD_95_pred, colour=factor(MS_bin)))
    

    Here's the result:

    enter image description here

    UPDATE: To plot regression lines for both male and female, just include Sex="F" in pred.data and add Sex as an aesthetic in the plot. In the example below, I use different shapes to mark Sex when plotting the points and different line types to mark Sex for the regression lines.

    pred.data = expand.grid(Argos=mean(cuckoo$Argos), Sample=mean(cuckoo$Sample), 
                            Precip=mean(cuckoo$Precip), NDVI=mean(cuckoo$NDVI), 
                            Sex=unique(cuckoo$Sex), Forest=seq(0,100,10), MS_bin=unique(cuckoo$MS_bin), 
                            ID=unique(cuckoo$ID))
    
    pred.data$lgKD_95_pred = predict(avgmod, newdata=pred.data)
    
    ggplot() +
      geom_point(data=cuckoo, aes(Forest, log(KD_95), shape=Sex, 
                                  colour=factor(MS_bin)), size=3) +
      geom_line(data=pred.data, aes(Forest, lgKD_95_pred, linetype=Sex, 
                                    colour=factor(MS_bin))) 
    

    enter image description here