post hoc results from emmeans does not reflect differences in data

I have a data with three factor variables:

1.) Item_Type: Factor w/ 4 levels "control","Ar", "Eng", "Fr"

2.) Time of Testing: Factor w/ 3 levels "Pretest","Posttest", "Delayedtest"

3.) Accuracy: Factor w/ 2 levels "0", "1"

Item_Type and Time_of Testing are both within subject variables.

The main hypothesis is that in Pretest, there will be no Item_Type effect. However, in Posttest, after reading treatment, there should be a growth in all Item_Type means from Pretest to Posttest (after the treatment), except for control items. That is, this growth in means shouldn't appear in control items because these items are the only items which didn't receive any forms of treatment. These control items only appear in the testing phase, to serve as a baseline upon which experimental Item_Types can be compared.

I fitted my gamer model.

Then i run the post hoc comparisons.

FINAL_ACC <- glmer(Accuracy ~ Time_of_Testing*Item_Type + (1|subject) + (1|Item), family = "binomial", data=Task_df,  na.action= na.exclude)

emmeans(FINAL_ACC, pairwise ~ Time_of_Testing*Item_Type, adjust= "bonferroni", type= "responce")

However, the post-hoc results show that control items (whose means of accuracy is in fact the lowest among all other Item_Types) have indeed the highest least square means . Moreover, the only significant difference in Item_Types between Pretest and Posttest are found among control items (despite the fact that the actual average difference in means between Pretest and Posttest among all other Items is +0.37, and for control Items is only +.1).

I think the problem is that the model assumes no difference between all items in pretest and posttest, and that's why the fitted values for control items are misleading. How you do you think I should solve this issue. Would you advice me to do some orthogonal contrasts, as I did not do so when I run the emmeans test. Would advice to add another variable named (treatment, coded as 0 for control items and 1 for all other items?)

I really appreciate your kind help and advice.

Solution

As suggested in a comment, you need a model that predicts the same thing for all baseline cases. I suggest this: First create a new factor:

Task_df$Phase <- with(Taskdf, 
    factor((Time_Of_Testing == "pretest") + 2*(Time_Of_Testing != "pretest"), 
           labels = c("baseline","experimental")))
Task_df$Item_Type[Task_df$Phase == "baseline"] <- "pretest"

The second ensures that all baseline cases have the same factor level for Item_Type.

Now, fit a nested model:

ULTIMATE_ACC <- glmer(Accuracy ~ Phase + Phase:(Time_of_Testing * Item_Type) 
    + (1|subject) + (1|Item), 
    family = "binomial", data=Task_df,  na.action= na.exclude)

Then do your EMMs:

emm <- emmeans(ULTIMATE_ACC, ~ Time_Of_Testing * Item_Type, type = "response")
emm           ### display estimates
pairs(emm, adjust = "bonferroni")   ### pairwise comparisons

Note that it is "response" not "responce".

If you don't specify adjust, you'll get the Tukey adjustment which is less conservative. You might also consider, e.g., pairs(emm, by = "Time_Of_Testing") to do separate sets of pairwise comparisons for each level of one factor.