rstatisticsmanovaemmeans

estimated marginal means of a MANCOVA in R


I have constructed a model that accounted for a covariate. There are two dependent variables ("A","B") and two independent variables ("C", "D") and one continuous covariate ("E"). I ran a MANCOVA as follows:

x<-cbind(A,B) #combining dependent variables
y<-cbind(C,D) #combining independent variables
fit<-manova(x~y+E)
summary(fit, test="Pillai")

This all worked perfectly and I found that there was an effect of the covariate on the dependent variables. Therefore I want to account for the covariance with an estimated marginal mean, using the emmeans package. However, when I try to run the following code I receive this error:

library(emmeans)
emmeans(fit,~y+E)

>Error in eval(expr, envir, enclos) : object 'spc.l$Ghopper.Start..g.' not 
found
>Error in ref_grid(object, ...) : Perhaps a 'data' or 'params' argument is 
needed

Here is my data:

 structure(list(ï..insect = c(105L, 106L, 107L, 108L, 110L, 112L, 
113L, 114L, 115L, 116L, 117L, 118L, 119L, 120L, 121L, 122L, 123L, 
125L, 126L, 127L, 128L), C = structure(c(1L, 2L, 1L, 2L, 2L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L
), .Label = c("Pair A 7p:35c-35p:7c", "Pair B 7p:35c-28p:14c"
), class = "factor"), D = structure(c(1L, 1L, 2L, 2L, 1L, 2L, 
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("F", 
"M"), class = "factor"), E = c(0.357, 0.259, 0.128, 0.104, 0.248, 
0.111, 0.218, 0.213, 0.13, 0.123, 0.335, 0.22, 0.247, 0.295, 
0.297, 0.219, 0.132, 0.194, 0.207, 0.266, 0.234), A = c(0.025333333, 
0.041666665, 0.043833332, 0.046333331, 0.108499995, 0.051999997, 
0.101833329, 0.06083333, 0.059499998, 0.056166664, 0.017833333, 
0.053666664, 0.066333331, 0.025499998, 0.073666664, 0.149333324, 
0.044666665, 0.047499998, 0.051833331, 0.020499999, 0.062499997
), B = c(0.050666667, 0.020333321, 0.023166668, 0.029666645, 
0.032499992, 0.028999981, 0.029166671, 0.024166656, 0.025500002, 
0.020833325, 0.021166667, 0.038333304, 0.023666669, 0.022499981, 
0.040333336, 0.121666569, 0.023333335, 0.017500002, 0.01816666, 
0.018500001, 0.024499989)), .Names = c("ï..insect", "C", "D", 
"E", "A", "B"), class = "data.frame", row.names = c(NA, -21L))

I am sure that there is a simple fix to this problem, but I am a bit lost and there are few questions posted on stackexchange about emmeans!


Solution

  • Here is something that works, computationally:

    R> fit = manova(x ~ cbind(C, D) + E, data = dat)
    R> ref_grid(fit)
    'emmGrid' object with variables:
        C = 1.5238
        D = 1.2857
        E = 0.21605
        rep.meas = multivariate response levels: A, B
    R> emmeans(fit, ~ C + D + E)
           C        D         E     emmean          SE df   lower.CL   upper.CL
     1.52381 1.285714 0.2160476 0.04438095 0.005377854 17 0.03303467 0.05572723
    
    Results are averaged over the levels of: rep.meas 
    Confidence level used: 0.95
    

    I'll say more later about these results. So, replacing y with cbind(C, D) in the model call will make it work (computationally). Using y directly, I get an error message:

    R> fit0 = manova(x ~ y+E, data = dat)
    R> ref_grid(fit0)
    Error in model.matrix(trms, m, contrasts.arg = object$contrasts)[, nm,  : 
      subscript out of bounds
    

    This is not the same error message shown in the OP, and I can only guess that the scoping is different somehow. But what is happening here is that C and D are actually factors, not numeric predictors. cbind(C,D) converts these to a numeric matrix with two columns. There is some technical reason for the error that I need to investigate and correct.

    But the important thing is that neither fit nor fit0 is model you want to use for post-hoc comparisons, because after all, C and D are factors. The reference grid, and hence the EMMs, for fit are based on the average values of the numeric recodings of C and D, and the average E. That's why there is only one row of emmeans output.

    What I think is needed is to take the cbind out of the call:

    R> fit1 = manova(x ~ C + D + E, data = dat)
    R> summary(fit1)
              Df  Pillai approx F num Df den Df  Pr(>F)
    C          1 0.07083   0.6098      2     16 0.55559
    D          1 0.03150   0.2602      2     16 0.77408
    E          1 0.37794   4.8606      2     16 0.02242
    Residuals 17                                       
    R> ref_grid(fit1)
    'emmGrid' object with variables:
        C = Pair A 7p:35c-35p:7c, Pair B 7p:35c-28p:14c
        D = F, M
        E = 0.21605
        rep.meas = multivariate response levels: A, B
    

    Since E is numeric and reduced to only its mean, you need not include it in the emmeans call:

    R> emmeans(fit1, ~ C + D)
     C                     D     emmean          SE df    lower.CL   upper.CL
     Pair A 7p:35c-35p:7c  F 0.05021337 0.011726789 17  0.02547201 0.07495473
     Pair B 7p:35c-28p:14c F 0.05576090 0.008808415 17  0.03717677 0.07434503
     Pair A 7p:35c-35p:7c  M 0.01962942 0.016299910 17 -0.01476039 0.05401922
     Pair B 7p:35c-28p:14c M 0.02517695 0.019816637 17 -0.01663250 0.06698640
    
    Results are averaged over the levels of: rep.meas 
    Confidence level used: 0.95
    

    These results are averaged over the two repeated measures in the dependent variable. You might want to separate them out, or average over the levels of some other factor. Since neither C nor D is significant, I'll just get the EMMs and comparison thereof for rep.meas:

    R> emmeans(fit1, pairwise ~ rep.meas)
    $emmeans
     rep.meas     emmean          SE df   lower.CL   upper.CL
     A        0.04551606 0.008547982 17 0.02748139 0.06355073
     B        0.02987426 0.006885079 17 0.01534801 0.04440050
    
    Results are averaged over the levels of: C, D 
    Confidence level used: 0.95 
    
    $contrasts
     contrast   estimate          SE df t.ratio p.value
     A - B    0.01564181 0.005645167 17   2.771  0.0131
    
    Results are averaged over the levels of: C, D
    

    Since E is significant and it is a covariate, it might be interesting to see if there is a different slope for each level of rep.meas:

    R> emtrends(fit1, pairwise ~ rep.meas, var = "E")
    $emtrends
     rep.meas    E.trend        SE df   lower.CL   upper.CL
     A        -0.3472133 0.1737591 17 -0.7138129 0.01938632
     B         0.0213882 0.1399564 17 -0.2738940 0.31667042
    
    Results are averaged over the levels of: C, D 
    Confidence level used: 0.95 
    
    $contrasts
     contrast   estimate        SE df t.ratio p.value
     A - B    -0.3686015 0.1147521 17  -3.212  0.0051
    
    Results are averaged over the levels of: C, D
    

    Also, try this to get a nice graph of the predictions (result not shown):

    emmip(fit1, rep.meas ~ E|C*D, at=list(E = c(.1,.35)))