I have run a GLMM using glmer (from lme4) in R. The fixed effects estimates are very different (much smaller) than expected in comparison to the estimated marginal means.
GLMM OUTPUT
`Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) \['glmerMod'\]
Family: inverse.gaussian ( identity )
Formula: latency \~ Con_shape \* Cue + (1 + Con_shape | subject)
Data: SN_P_PMT_match_data_rt
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
AIC BIC logLik deviance df.resid
81984.2 82071.9 -40979.1 81958.2 6269
Scaled residuals:
Min 1Q Median 3Q Max
\-3.1154 -0.6553 -0.1732 0.4628 4.9629
Random effects:
Groups Name Variance Std.Dev. Corr
subject (Intercept) 1.718e+03 41.44726
Con_shapeS vs. F 4.076e+03 63.84257 0.02
Con_shapeS vs. Str 4.896e+03 69.97401 0.18 -0.57
Residual 7.191e-05 0.00848
Number of obs: 6282, groups: subject, 42
Fixed effects:
Estimate Std. Error t value Pr(\>|z|)
(Intercept) 796.370 14.132 56.354 \< 2e-16 \*\*\*
Con_shapeS vs. F 16.932 13.337 1.270 0.204
Con_shapeS vs. Str 95.656 15.344 6.234 4.55e-10 \*\*\*
CueSad vs Neu -3.186 4.045 -0.788 0.431
Con_shapeS vs. F:CueSad vs Neu -7.216 10.676 -0.676 0.499
Con_shapeS vs. Str:CueSad vs Neu -1.303 11.727 -0.111 0.911
-
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) C_Sv.F C_Sv.S CSdvsN C_vFvN
Cn_shpSvs.F 0.088
Cn_shpSvs.S 0.028 -0.271
CueSadvsNeu -0.014 0.016 0.001
C_Sv.F:CSvN -0.096 -0.099 -0.055 0.019
C_Sv.S:CSvN 0.061 0.129 0.001 0.130 -0.548`
EMM
`Con_shape Cue emmean SE df asymp.LCL asymp.UCL
self sad 728.7943 13.14691 Inf 703.0269 754.5618
friend sad 785.5311 13.31949 Inf 759.4254 811.6369
stranger sad 811.8101 13.48155 Inf 785.3868 838.2334
self neutral 733.1200 13.98504 Inf 705.7098 760.5302
friend neutral 781.1784 14.08376 Inf 753.5748 808.7821
stranger neutral 810.0302 14.53561 Inf 781.5409 838.5194`
Froom looking here I understand that fixed effects and EMMs may differ but I believe I have set the contrast coding so that coefficients should align with the EMMs. Due to this I am worried that there are issues in the model. To check this I have run a GLMM in Jamovi which uses the GAMLj package and the coefficients are what I would expect and align with the EMM.
enter image description here enter image description here
I am therefore wondering what is causing this difference please?
The variables are coded like this in R:
#Contrast coding - categorical variables (non-orthogonal)#
`contrasts(data$Con_shape) <- cbind("S vs. F" = c(-.5,.5,0),
"S vs. Str" = c(-.5,0,.5))
S vs. F S vs. Str
self -0.5 -0.5
friend 0.5 0.0
stranger 0.0 0.5
contrasts(data$Cue) <- cbind("Sad vs Neu" = c(-.5,.5))
Sad vs Neu
sad -0.5
neutral 0.5`
This is the GLMM code:
`#Run glmer
model <- glmer(latency ~ Con_shape*Cue +
(1+ Con_shape|subject),
data=data,
family=inverse.gaussian(link="identity"),
control=glmerControl
(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))`
I am wondering whether the difference comes from using scaled contrast coding as opposed to unscaled (-1, 1, 0)?
I tried using unscaled contrast coding and this led to significant effects where expected when looking at plots from the EMMs but again very different coefficients (plus issues with singularity).
Unscaled contrasts
contrasts(SN_P_PMT_data$Con_shape) <- cbind("S vs. F" = c(-1,1,0),
"S vs. Str" = c(-1,0,1))
contrasts(SN_P_PMT_data$Cue) <- cbind("Sad vs Neu" = c(-1,1))
GLMM output
`Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: inverse.gaussian ( identity )
Formula: latency ~ 1 + Con_shape * Cue + (1 + Con_shape | subject)
Data: SN_P_PMT_match_data_rt
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
AIC BIC logLik deviance df.resid
82171.8 82259.5 -41072.9 82145.8 6269
Scaled residuals:
Min 1Q Median 3Q Max
-3.0716 -0.6663 -0.1754 0.4809 5.3294
Random effects:
Groups Name Variance Std.Dev. Corr
subject (Intercept) 1.734e+03 41.642073
Con_shapeS vs. F 5.894e+00 2.427823 1.00
Con_shapeS vs. Str 8.270e+02 28.758130 0.19 0.19
Residual 7.405e-05 0.008605
Number of obs: 6282, groups: subject, 42
Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) 786.600 13.085 60.113 < 2e-16 ***
Con_shapeS vs. F 10.198 3.057 3.336 0.000851 ***
Con_shapeS vs. Str 43.327 8.685 4.989 6.07e-07 ***
CueSad vs Neu -1.804 2.118 -0.852 0.394476
Con_shapeS vs. F:CueSad vs Neu -1.788 2.972 -0.602 0.547312
Con_shapeS vs. Str:CueSad vs Neu -0.504 3.137 -0.161 0.872377
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) C_Sv.F C_Sv.S CSdvsN C_vFvN
Cn_shpSvs.F 0.220
Cn_shpSvs.S 0.127 -0.150
CueSadvsNeu -0.004 0.001 -0.007
C_Sv.F:CSvN 0.001 -0.009 0.013 0.009
C_Sv.S:CSvN 0.013 0.012 -0.006 0.150 -0.574
optimizer (bobyqa) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')`
I think you might have misunderstood the linked question. Setting the contrasts as specified there will make the parameter estimates similar to specific contrasts among levels of the factors, not similar to the estimated means of particular levels.
The short version of this is that @rvlenth is correct when he says you shouldn't worry about how to torture lm
into giving you the contrasts you want; let emmeans
, or some other machine for doing contrasts (e.g. the marginaleffects
package), do it for you.
Example:
mtcars <- transform(mtcars, cyl = factor(cyl), am = factor(am))
m <- lm(mpg ~ cyl*am, data = mtcars)
printCoefmat(coef(summary(m)), digits = 3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.90 1.75 13.08 6.1e-13 ***
cyl6 -3.77 2.32 -1.63 0.11515
cyl8 -7.85 1.96 -4.01 0.00045 ***
am1 5.18 2.05 2.52 0.01818 *
cyl6:am1 -3.73 3.09 -1.21 0.23855
cyl8:am1 -4.83 3.09 -1.56 0.13107
These values are, by default, the differences of each level from the baseline level. You could change the contrasts to get them to estimate something else (such as the difference of all but the last level from the grand mean).
Using emmeans
the way you did gives you the expected marginal means of each cyl
/am
combination; while these are great for prediction, or for understanding the model results, they're not what you want to use for inference.
emmeans(m, ~cyl*am)
cyl am emmean SE df lower.CL upper.CL
4 0 22.9 1.750 26 19.3 26.5
6 0 19.1 1.520 26 16.0 22.2
8 0 15.1 0.875 26 13.3 16.8
4 1 28.1 1.070 26 25.9 30.3
6 1 20.6 1.750 26 17.0 24.2
8 1 15.4 2.140 26 11.0 19.8
In this particular case you get these values as the coefficients from lm
by using the model formula mpg ~ 0 + cyl:am
, but in general (as with setting contrasts) it's easier to let emmeans
do it for you.
If you want to do pairwise comparisons you should do something like this:
contrast(emmeans(m, ~cyl), "pairwise")
NOTE: Results may be misleading due to involvement in interactions
contrast estimate SE df t.ratio p.value
cyl4 - cyl6 5.64 1.55 26 3.646 0.0032
cyl4 - cyl8 10.26 1.55 26 6.632 <.0001
cyl6 - cyl8 4.62 1.64 26 2.822 0.0237
Results are averaged over the levels of: am
P value adjustment: tukey method for comparing a family of 3 estimates
Pairwise contrasts aren't the only option: see the relevant vignette, or help("contrast-methods")
, for more details.