rggplot2predictgeom-ribbon

R - ggplot - geom_ribbon() makes spikey / jagged plot for groups


I'm working with a growth curve model (generated with lmer) and would like to add confidence intervals to the model plot using geom_ribbon(). (The full R code can be downloaded here and the data set is here.)

#Fit Model
GCA.elogit<-lmer(elog~(ot1+ot2+ot3)*Treatment*Experiment + (ot1|Subject), 
                 control=lmerControl(optimizer="bobyqa"), data=GCA, weights=1/wts, REML=F)
#Add the fitted values to the data frame
GCA2<-data.frame(GCA, GCA_Full=fitted(GCA.elogit))

I got lower and upper bounds for the confidence interval using library(merTools) and the predictInterval() function and added these bounds to the data frame.

#Use Predict to get confidence intervals (upr and lwr), add to new data frame
library(merTools)
prediction<-predictInterval(GCA.elogit, GCA2, level=0.8)
GCA3 <- cbind(GCA2, prediction)

But when I try to use these lower and upper bounds in a model plot with geom_ribbon(), I get some funny behavior. I can use stat_summary() to draw the bounds using dashed lines, but geom_ribbon doesn't fill in this space like it should. Instead, it creates a shaded period with spikes for each of the data points. 1: https://i.sstatic.net/ae8Ma.jpg:

#plot model -- Filtered for Example -- Creates jagged lines
mplot2<-filter(GCA3, Experiment == "Blocking", Time <= 10)%>% 
  ggplot(aes(Time, GCA_Full, group=Treatment, fill=Treatment, color=Treatment))+
  stat_summary(aes(y=GCA_Full), fun=mean, geom="line")+ #fit
  stat_summary(aes(y=upr), fun=mean, geom="line", linetype="dashed")+ #dashed upper bounds
  stat_summary(aes(y=lwr), fun=mean, geom="line", linetype="dashed")+ #dashed lower bounds
  geom_ribbon(aes(ymin=lwr, ymax=upr), na.rm=TRUE, color=NA , alpha = .1) + #ribbon should fit between upr and lwr dashed lines
  theme_bw(base_size=10)+facet_wrap(~Experiment)
labels<-labs(x="Item", y="Accuracy (elogit)")
mplot2+labels+scale_x_continuous(breaks=seq(0,10, 1))

The funny thing is that if I look at an individual subject (or group by subject like I did below) it works as expected. https://i.sstatic.net/2m8mS.jpg

#plot model color -- Filter includes subject -- Ribbon works as expected
mplot2<-filter(GCA3, Experiment == "Blocking", Time <= 10, as.numeric(Subject)<= 1002) %>% 
  ggplot(aes(Time, GCA_Full, group=Subject, fill=Subject, color=Subject))+
  stat_summary(aes(y=GCA_Full), fun=mean, geom="line")+ #fit
  stat_summary(aes(y=upr), fun=mean, geom="line", linetype="dashed")+ #dashed upper bounds
  stat_summary(aes(y=lwr), fun=mean, geom="line", linetype="dashed")+ #dashed lower bounds
  geom_ribbon(aes(ymin=lwr, ymax=upr), na.rm=TRUE, color=NA , alpha = .1) + #ribbon should fit between upr and lwr dashed lines
  theme_bw(base_size=10)+facet_wrap(~Experiment)
labels<-labs(x="Item", y="Accuracy (elogit)")
mplot2+labels+scale_x_continuous(breaks=seq(0,10, 1))

I'm at a complete loss here and I feel like I've tried 1000 small fixes. Any help would be greatly appreciated!


Solution

  • Your overall issue here is that you are using stat_summary() to calculate the means for your growth curve line and for its upper and lower bounds, but you are not doing the same for geom_ribbon(). So instead, geom_ribbon() is trying to plot every observation in the original data set, apparently in ascending order within each time point, while stat_summary() is instead only plotting the respective means (one for each time point). The reason why it appears correct when grouping by subjects is because each subject only has one observation to produce only one data point at each time point, while conversely each treatment group includes all observations, and thus all data points, for each subject that belong to it.

    What you can do is first create a new data frame with the calculations of the means for the upper and lower limits at each time point. The following code first filters for the sample you used as an example (which you can remove for your full data set), then groups by the factor variables to be plotted (Time on the x-axis and Treatment to be grouped), then calculates the means for GCA_Full and its upper and lower limits (note that it is easier to keep the same variable names of GCA_Full, upr, and lwr, respectively, for plotting).

    GCA3_means <- GCA3 %>%
      filter(Experiment == "Blocking", Time <= 10) %>%
      group_by(Time, Treatment) %>%
      summarize(GCA_Full = mean(GCA_Full),
                upr = mean(upr),
                lwr = mean(lwr))
    

    Now that those means are calculated, you can call the newly created data frame into the data argument in your original geom_ribbon() call as follows.

    #plot model -- Filtered for Example -- Creates jagged lines
    mplot4<-filter(GCA3, Experiment == "Blocking", Time <= 10)%>% 
      ggplot(aes(Time, GCA_Full, group=Treatment, fill=Treatment, color=Treatment))+
      stat_summary(aes(y=GCA_Full), fun=mean, geom="line")+ #fit
      stat_summary(aes(y=upr), fun=mean, geom="line", linetype="dashed")+ #dashed upper bounds
      stat_summary(aes(y=lwr), fun=mean, geom="line", linetype="dashed")+ #dashed lower bounds
      geom_ribbon(data = GCA3_means, aes(ymin=lwr, ymax=upr), na.rm=TRUE, color=NA , alpha = .1) + #ribbon should fit between upr and lwr dashed lines
      theme_bw(base_size=10)+facet_wrap(~Experiment)
    labels<-labs(x="Item", y="Accuracy (elogit)")
    mplot4+labels+scale_x_continuous(breaks=seq(0,10, 1))
    

    Now, geom_ribbon() will plot the means for the upper and lower limits at each time point rather than each observation to produce the following plot.

    Plot with corrected CIs

    All the best, Ty