rpcarandom-forestveganmanova

Labeling the centroids of a PCoA based on betadisper() multivariate dispersions in R


I've used the function betadisper() in the vegan package to generate multivariate dispersions and plot those data in a PCoA. In this example I'll be looking at the difference between the sexes in a singular species.

Load the original data. For our purposes this can legit be anything here. The data I'm using isn't special. Its feature measurements are from a bioacoustic dataset. I am walking through my process:

my_original_data = read.csv("mydata.csv", as.is = T, check.names = F)
#Just extract the numeric/quantitative data.
myData=my_original_data[, 13:107]

Based on previous research, we used an unsupervised randomForest to determine similarity within our original feature measurements:

require(randomForest)
full_urf = randomForest(myData, proximity=T, scale=TRUE, ntree=4999,importance = TRUE)

A index was then generated using the proximity matrix:

urf_dist_full = as.dist(1-full_urf$proximity)

An permutational MANOVA was run on the resulting index using the vegan package. The use of the pMANOVA was well researched and is the correct test for my purposes:

mod=adonis(formula = urf_dist_full ~ Sex * Age * Variant, data = my_original_data, permutations = 999, method = "euclidean") 

my_original_data had qualitative factors, Sex, Age and Variant. I could have extracted them, but it seemed cleaner to keep them within the original dataset.

After running a few homogeneity tests, I want to plot the multivariate dispersions. To do this I have been using the betadisper function:

Sex=betadisper(urf_dist_full,my_original_data$Sex)
plot(Sex, main="Sex Multivariate Dispersions")

That plots this beauty:

Sex Multivariate Dispersions

How can I label the centroids as Male and Female? I also want to run this plot for the Variant category, but that has five factors rather than two, which really warrants labeling.

I've seen the boxplot() variant of this, but I like how the PCoA also shows clustering.


Solution

  • You can add labels to centroids like this:

    ordilabel(scores(Sex, "centroids"))
    

    where Sex is your betadisper result. If you do not want to use the original names of your centroids, you can change the names with:

    ordilabel(scores(Sex, "centroids"), labels=c("A","B"))