rmanova

Error in summary.manova - residuals have rank order deficiency


I am trying to carry out a MANOVA. There are 7 dependent variables and a categorical independent variable representing 6 groups.

The data are available here: http://pastebin.com/fqXNjWtr

Click download above the text. I am reading the file with R like this (I think the name of the downloaded file should be the same for you; I'm using a Macintosh operating system):

> df <- read.csv("~/downloads/fqXNjWtr.txt", stringsAsFactors = F)
> str(df)

'data.frame':   244 obs. of  8 variables:
 $ var1              : num  0.3 0 0.312 0 0.643 ...
 $ var2              : num  0 0.125 0 0.375 0.0714 ...
 $ var3              : num  0 0.0625 0.0625 0 0.0714 ...
 $ var4              : num  0.2 0.3125 0.0625 0.0625 0 ...
 $ var5              : num  0.1 0.25 0.438 0.188 0 ...
 $ var6              : num  0.2 0.0625 0.125 0.0625 0.0714 ...
 $ var7              : num  0.2 0.188 0 0.312 0.143 ...
 $ cluster_assignment: int  1 4 2 6 1 4 3 3 4 6 ...

I am then creating the dependent variable, DV:

> df$DV <- as.matrix(df[, 1:7])

I am then carrying out the MANOVA:

> mv_out <- manova(DV ~ cluster_assignment, data = df)
Call:
   manova(DV ~ cluster_assignment, data = df)

Terms:
                cluster_assignment Residuals
resp 1                    5.160838  6.738524
resp 2                    3.384101  3.622020
resp 3                    0.000200  3.365565
resp 4                    0.065469  2.743549
resp 5                    0.889180  8.019733
resp 6                    0.442187  5.884827
resp 7                    3.133188  7.736993
Deg. of Freedom                  1       242

Residual standard errors: 0.1668686 0.1223398 0.1179292 0.1064752 0.1820423 0.1559406 0.1788045
Estimated effects may be unbalanced

When I then try the summary() function, I get this error:

> summary(mv_out)
Error in summary.manova(mv_out) : residuals have rank 6 < 7

Based on some other posts, this seems to suggest that there are not enough observations given the number of variables, or that some of the predictors may be multicollinear. But this doesn't seem to be the case with this data:

> cor(df[, 1:7)

            var1         var2        var3         var4        var5        var6       var7
var1  1.00000000 -0.417605243 -0.05274197 -0.118358341 -0.25617705  0.06089533 -0.4360312
var2 -0.41760524  1.000000000 -0.07181878  0.008873035 -0.29523300 -0.33954011  0.1958746
var3 -0.05274197 -0.071818782  1.00000000  0.131137673 -0.11624079 -0.14408909 -0.2951076
var4 -0.11835834  0.008873035  0.13113767  1.000000000 -0.14361455 -0.24308229 -0.1491373
var5 -0.25617705 -0.295233000 -0.11624079 -0.143614554  1.00000000 -0.03180183 -0.2383027
var6  0.06089533 -0.339540114 -0.14408909 -0.243082287 -0.03180183  1.00000000 -0.3215075
var7 -0.43603124  0.195874568 -0.29510761 -0.149137349 -0.23830275 -0.32150753  1.0000000

I'm puzzled about what may be going on.


Solution

  • You can resolve this error by setting the 'tol' parameter in ?summary.manova. df$DV fails the rank deficient test with the default tol=1e-7 because the rowSums are 1. This might not produce the results you intended though.

    summary(mv_out,tol=0)
                           Df Pillai approx F num Df den Df Pr(>F)
    df$cluster_assignment   1 1.2106  -193.79      7    236       
    Residuals             242