I am rusty with my stats knowledge, please correct me if I use the wrong terminology or misunderstand anything.
I am using adonis to perform a permanova test with the script:
nmds.div<- adonis2(nmds.dist ~ Season*Area, data = Type0, permutations = 999, method="bray")
Where Season has three levels (March, May, Sept) and Area has two levels (Pacific, Atlantic). The dependent variable is a distance matrix based on bray-curtis using OTU read counts. I want to see the interaction term(?) between Season and Area but this is what I get:
Df SumOfSqs R2 F Pr(>F)
Season 2 6.4903 0.27066 8.9066 0.001 ***
Residual 48 17.4889 0.72934
Total 50 23.9792 1.00000
When I run the same code format for Cruise and Layer3, the output table works fine and I get the interaction term - probability for Cruise:Layer3. Where Cruise has three levels (KS17, KS14 and HO15) and Layer3 has two levels (euphotic, aphotic).
nmds.div<- adonis2(nmds.dist ~ Cruise*Layer3, data = Type0, permutations = 999, method="bray")
Df SumOfSqs R2 F Pr(>F)
Cruise 2 6.4903090 0.27066356 9.787264 0.001
Layer3 1 0.4029121 0.01680253 1.215168 0.311
Cruise:Layer3 2 2.1654176 0.09030381 3.265409 0.002
Residual 45 14.9206109 0.62223010 NA NA
Total 50 23.9792496 1.00000000 NA NA
Table produced by:
table(Type0$Season, Type0$Area)
Pacific Atlantic
Mar 16 0
May 27 0
Sept 0 8
So, my question is how come the same code works for Cruise*Layer3, but not for Season *Area? Are there restrictions with the independent variables?
I think the short answer is that your model contains a high degree of multicolinearity because all of your "Sept" values came from the "Atlantic".
In other words, the additional factor of "Area" does not provide additional information, and so adonis2()
drops a factor.
To see what I mean, here are two examples of simulated data. The first has the cell counts that match your data. Here you end up with a single factor in the result. 'Area' was dropped.
# fake data 1
nmds <- sample(1:1000, 51, replace = TRUE)
season <- factor(c(rep(1, 16), rep(2, 27), rep(3, 8)),
labels= c("Mar", "May", "Sept"))
area <- factor(c(rep(1,43), rep(2,8)), labels = c("Pacific", "Atlantic"))
Type0 <- data.frame(nmds = nmds, Season =season, Area=area)
# cell counts
> table(Type0$Season, Type0$Area)
Pacific Atlantic
Mar 16 0
May 27 0
Sept 0 8
nmds.div1 <- adonis2(nmds ~ Season*Area, data = Type0,
permutations = 999, method="bray")
> nmds.div1
adonis2(formula = nmds ~ Season * Area, data = Type0, permutations = 999, method = "bray")
Df SumOfSqs R2 F Pr(>F)
Season 2 0.1720 0.02919 0.7216 0.583
Residual 48 5.7204 0.97081
Total 50 5.8924 1.00000
In this second example, I provide random data in Area
, which gives you greater-than-zero counts in all of the cells in the table. In this scenario the factors are no longer redundant. And adonis2()
returns estimates for both factors and the interaction.
# fake data 2
nmds <- sample(1:1000, 51, replace = TRUE)
season <- factor(c(rep(1, 16), rep(2, 27), rep(3, 8)),
labels= c("Mar", "May", "Sept"))
set.seed(1)
area <- factor(sample(1:2, 51, replace = TRUE), labels = c("Pacific", "Atlantic"))
Type0 <- data.frame(nmds = nmds, Season =season, Area=area)
# cell counts
> table(Type0$Season, Type0$Area)
Pacific Atlantic
Mar 11 5
May 14 13
Sept 2 6
nmds.div2 <- adonis2(nmds ~ Season*Area, data = Type0,
permutations = 999, method="bray")
> nmds.div2
adonis2(formula = nmds ~ Season * Area, data = Type0, permutations = 999, method = "bray")
Df SumOfSqs R2 F Pr(>F)
Season 2 0.2721 0.04736 1.1661 0.313
Area 1 0.1721 0.02995 1.4747 0.233
Season:Area 2 0.0515 0.00895 0.2205 0.948
Residual 45 5.2510 0.91374
Total 50 5.7467 1.00000