I fitted a generalized linear mixed model to my dataset using the lme4 package in R. The model formula and output are below:
Because I found a significant interaction between "period" and "Adjunct", I tried to look deeper by extracting their effects and plotting the interaction with the package effects. I tried to obtain the effects using the scripts below:
var.adSem <- data.frame(Effect(c("period", "AdjunctSem"), m7))
head(var.adSem)
var.adSem $ period <- droplevels(var.adSem $ period)
levels(var.adSem $ period)
var.adSem $ AdjunctSem <- droplevels(var.adSem $ AdjunctSem)
levels(var.adSem $ AdjunctSem)
And these are the effects I extracted:
And I tried to plot the effects using the scripts below:
perc <- function(x) {paste(100 * x, "%")}
Plot1<-(ggplot(var.adSem,aes(x=factor(period, level=c('period_1', 'period_2','period_3')),fit, shape=AdjunctSem))
+ geom_hline(yintercept=.5, linetype="dashed", alpha=.5)
+ geom_errorbar(aes(ymin=fit-se,ymax=fit+se),width=.2,position = position_dodge(0.3))
# + annotate("rect", xmin = c(4.5, 6.5), xmax = c(5.5, 9.5),
# ymin = -Inf, ymax = Inf,
# alpha = 0.2, fill = c("#6ac2ee"))
+ geom_point(size=4, position = position_dodge(0.3))
+ labs(y="Probability of the ba-construction", x="period")
+ theme(#plot.title = element_text(size=20, face="bold", vjust=2),
plot.title = element_blank(),
axis.title.x = element_text(size=14, vjust=-0.5),
axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),
axis.title.y = element_text(size=14, vjust=1.5),
legend.position = "top")
+ scale_y_continuous(label=function(x){paste(100*x,"%")},
breaks=seq(0,1,.1),
limits=c(0,1)))
And the resulting plot looks like this:
It seems from the effect plot that there are no obvious interactions at all. However, the model output clearly indicates a significant interaction (p<0.001). Is it because I made a mistake somewhere in the scripts? Please click this link if you would like access to the dataset.
You are using a binomial model, which means the significant differences that are reported in your model summary are differences in log odds. However, these have been converted to probabilities in var.adSem
.
If you have large log odds, the equivalent probabilities are close to one. Large negative log odds will be close to 0. The relationship is not linear, so that if you view log odds as probabilities, the high log odds will all appear compressed as probabilities close to 1, whereas the low log odds will appear compressed as probabilities close to 0.
We can illustrate this by taking a vector representing log odds and seeing what happens when we convert them to probabilities. First, the log odds:
library(ggplot2)
df <- data.frame(x = seq(-10, 10), log_odds = seq(-10, 10))
ggplot(df, aes(x, log_odds)) + geom_line() + geom_point()
Now we convert the log odds to probabilities using plogis
:
ggplot(df, aes(x, plogis(log_odds))) + geom_line() + geom_point()
Note that even though there is a constant change in log odds across the range, the change in probabilities becomes difficult to see above log odds of 5 or below log odds of -5, because the probabilities are close to 0 or close to 1.
With your own data, this means if you want to demonstrate the difference between groups, you need to show the results as log odds. You can transform the y axis so that it still shows probabilities, though the probabilities are so close to 0 and 1 that percentages become an unconventional choice:
ggplot(var.adSem, aes(x = period, y = qlogis(fit), shape = AdjunctSem)) +
geom_hline(yintercept = qlogis(0.5), linetype = "dashed", alpha = 0.5) +
geom_errorbar(aes(ymin = qlogis(lower), ymax = qlogis(upper)), width = 0.2,
position = position_dodge(0.3)) +
geom_point(size = 4, position = position_dodge(0.3)) +
labs(x = "period") +
theme(plot.title = element_blank(),
axis.title.x = element_text(size = 14, vjust = -0.5),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
axis.title.y = element_text(size = 14, vjust = 1.5),
legend.position = "top") +
scale_y_continuous("Probability of the ba-construction",
labels = ~scales::percent(plogis(.x)),
breaks = qlogis(c(1e-9, 1e-6, 1e-3, 0.5,
1 - 1e-3, 1 - 1e-6, 1 - 1e-9)))
Created on 2023-08-06 with reprex v2.0.2
Data obtained from OCR of table in question
var.adSem <- structure(list(period = c("period_2", "period_1", "period_3",
"period_2", "period_1", "period_3", "period_2", "period_1", "period_3"
), AdjunctSem = c("adverbial", "adverbial", "adverbial", "complement",
"complement", "complement", "withoutAdjunct", "withoutadjunct",
"withoutadjunct"), fit = c(3.51860434e-05, 2.737536e-07, 0.9999899965756,
1.97090038e-05, 4.566828e-07, 0.9999909905965, 3.5261392e-05,
2.81571e-07, 0.9999885020845), se = c(3.46701653e-05, 5.871505e-07,
2.82682182e-05, 1.96541803e-05, 9.787531e-07, 2.56165218e-05,
3.48405297e-05, 6.051007e-07, 3.28484263e-05), lower = c(5.100729398e-06,
4.089689e-09, 0.997462272508378, 2.79139626e-06, 6.844419e-09,
0.997634501991402, 5.084413615e-06, 4.171972e-09, 0.996901835438947
), upper = c(0.00024267862, 1.832405e-05, 0.99999996067, 0.00013914363,
3.047052e-05, 0.99999996577, 0.00024450078, 1.900319e-05, 0.99999995746
)), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9"))