rpcavegan

How to specify eigenvectors in PCoA for a dissimilarity matrix including for mixed/ordinal variables?


I have a dataset that contains ordinal variables on a number of species, and would like to visualise this using Principal coordinates analysis (PCoA). When I consider the data as continuous (numeric), it's straight forward to use vegan::vegdist to create a Gower dissimilarity index, ape::pcoa to compute the principal coordinate decomposition, and biplot to visual the variables:

library(ape)
library(vegan)
library(FD)

df <- data.frame(a = sample.int(4, 20, replace=TRUE),
b = sample.int(4, 20, replace=TRUE),
c = sample.int(4, 20, replace=TRUE),
d = sample.int(4, 20, replace=TRUE),
e = sample.int(4, 20, replace=TRUE))
rownames(df) <- paste0("species_", letters[1:20])

df.distance <- vegdist(df, "bray")
res <- pcoa(df.distance)
#biplot(res)
biplot(res, df)

enter image description here

However, because the variables are ordinal, vegdist can't account for this, so I used FD::gowdis to calculate Gowers dissimilarity for mixed variables instead.

df.ordinal <- df
df.ordinal$a <- factor(df.ordinal$a,levels=1:4,labels = c("low","medium","high","veryhigh"),ordered=T)
df.ordinal$b <- factor(df.ordinal$b,levels=1:4,labels = c("low","medium","high","veryhigh"),ordered=T)
df.ordinal$c <- factor(df.ordinal$c,levels=1:4,labels = c("low","medium","high","veryhigh"),ordered=T)
df.ordinal$d <- factor(df.ordinal$d,levels=1:4,labels = c("low","medium","high","veryhigh"),ordered=T)
df.ordinal$e <- factor(df.ordinal$e,levels=1:4,labels = c("low","medium","high","veryhigh"),ordered=T)

df.distance.gower <- gowdis(df.ordinal, ord="podani")
res <- pcoa(df.distance.gower)
biplot(res)

enter image description here

Not surprisingly the ordination is different when accounting for ordinal data, but I'm unable to visualise the overlying variables as eigenvectors:

> biplot(res.ordinal, df.ordinal)
Error in cov(Y, points.stand) : 
  is.numeric(x) || is.logical(x) is not TRUE```

Presumably this is because the variables are now ordinal data, not continuous as in the vegdist example.

Is there an approach that can visualise the eigenvectors/loadings with a mixed dataset, or is there a theoretical reason this can't be applied to a PCoA?


Solution

  • I am not quite sure what you asked. You do get eigenvectors in PCoA, but these eigenvectors only concern sampling units (rows). PCoA is based on dissimilarities (which are treated like distances) among sampling units, and these dissimilarities have no information about variables (columns) that generated the dissimilarities. For any dissimilarity matrix among SUs, there is an infinite number of possible column variables that generate these dissimilarities (we even cannot say how many column variables there are – nor what they are). This means that basically you cannot have information of columns in PCoA based on dissimilarities of sampling units (rows).

    Still you sometimes get this column information from software. This is based on having access to the data before calculating dissimilarities, and then treating these dissimilarities as Euclidean distances from these data (that they generally are not). Well, if they really happen to be Euclidean distances, you should not use PCoA, but PCA (principal components analysis). However, if your dissimilarity in PCoA is Euclidean, then you can have both row and column eigenvector scores/loadings. A consequence of this is that if your dissimilarities can be expressed as Euclidean distances of transformed data, then that transformed data gives you the column eigenvector solution ("loadings").

    In PCoA applications we normally ignore this incompatability, incongruence and impossiblity and pretend that input data can be used to project observed variables onto eigenvector solution of sampling units (rows) even with semimetric or nonmetric dissimilarities. This is not strictly correct, but often works quite nicely, and we hope this will free us from our sins. It won't, but cross your fingers. So the point is to find a transformation of the data that will be as similar as possible to the implicit transformation in the dissimilarity index. For ordinal data, you should have corresponding continuous numerical transformation where the ordinal levels are not equidistant, but at the values defined by that transformation. If you cannot get those (I don't know your software: these values can be got, but the software may not give the values),

    Please note, I wrote about adding column scores (eigenvector loadings), because PCoA does not know these and cannot know these, but you must add them after the analysis. This adding will be strictly correct if you had Euclidean distances in PCoA, or if your distances can be expressed as Euclidean distanes of transformed data, and these transformed data are used to add the information of variables (columns). In this case you should have used PCA of transformed data initially. In general in PCoA the column scores are never exactly correct, but they are just auxilliary information that may not be too bad in many cases and can often be useful and informative (even when strictly not correct).