rveganrdaprimer3

What is the difference between R vegan species-scores, and Primer Spearman rank correlation of species to axis?


In constrained ordination analysis, like CAP or dbRDA, researchers often want to know how much of the dissimilarity is attributed to specific species. In Primer PERMANOVA, Spearman rank or Pearson correlations of species to the axis is an option to that provides an estimate of the species that characterise the variation between assemblages of species when using CAP or RDA. In R, vegan provides a different measurement, known as species scores, which can be calculated but not without careful consideration of the potential shortcomings https://github.com/vegandevs/vegan/issues/254#issuecomment-334071917. and vegan dbrda species scores are empty despite community matrix provided, when using capscale.

I would like to better understand how the correlations and species scores are calculated in Primer PERMANOVA. Firstly, is there a real difference in what these methods aim to show? What are the benefits and shortcomings of the using Spearman or Pearson correlations over R- vegan's species scores? Does the method of calculating the species-to-axis correlations in Primer suffer from similar problems as detailed in the above links for species scores in capscale or dbrda in R? In Primer, what are the variables used from the community matrix and axis to calculate the correlations between them? Are these raw or the transformed data? And finally, if the correlation method is a better estimate of the relative amount by which species cause differences between assemblages than species scores in R, should this be considered as an alternative to R vegan' species scores?


Solution

  • I have never seen PRIMER and I cannot comment on differences between vegan and PRIMER, but I can explain how we work in vegan.

    If you think about species scores of fitted environmental variables as arrows, there are two separate aspects: direction and length. First about direction of the arrow.

    In general, the arrows are not parallel to the axes, but they point to the direction to which the species or the environmental variable changes most rapidly. The directions of arrows can be found from linear model lm(y ~ Ax1 + Ax2). If y is a species, this gives the arrow for the species score, and if y is an environmental variable, this gives a fitted vector. Correlating species with axes implies two separate models lm(y ~ Ax1) and lm(y ~ Ax2). The vegan model defines a linear trend surface, and the axis model defines two separate linear trend surfaces with each having steepest gradient along one axis. The following example shows how the linear model related to species scores in PCA in vegan:

    library(vegan)
    data(varespec)
    pc <- rda(varespec)
    biplot(pc) # species scores as biplot arrows
    plot(envfit(pc ~ Pleuschr + Cladarbu + Cladrang + Cladstel, varespec))
    ordisurf(pc ~ Cladstel, varespec, knots = 1, add = TRUE)
    

    The envfit function adds arrows that point to the same direction as species scores, and ordisurf adds linear (knots = 1) trend surface to Cladstel. The isoclines of the linear trend surface are equally spaced and perpendicular to the arrow. Projecting sampling units to the arrow gives the predicted species abundance in this two-dimensional solution. The interpretation of species scores is similar in RDA, but there you must remember to use linear combination scores (display="lc"), and in CCA you must remember to use weighted regression (envfit and ordisurf take care of that automatically, but with lm or other non-vegan tools you need explicit weights).

    This approach is not easily changed to use rank correlations. For ranks, you need to project points (sampling units) to a univariate sequence. Often people project on axes (that is, they correlate axes with species). However, better correlation will be found if we find a line through origin that gives the best rank correlation when sampling units are projected onto it for ranks (if a unique line, or a sector containing lines, exists). This would be similar to our approach of finding the direction of steepest change in linear trend surface. This is easily done with Euclidean space, like all vegan ordination spaces are, but not with ranks of projections.

    The assumption of linear trend surface is pretty simplistic. It is the model for species in PCA and RDA, and it is the model for constraints in RDA and there it shows how the analysis sees your data (remember "lc" scores!). However, with other variables and with other ordination methods, more complicated response models are often more adequate. These can be fit[ted] using ordisurf with knots > 1.

    Then about lengths of the arrows, or distances of species scores from the origin. The envfit() function finds the correct direction, but it scales the arrow length by correlation coefficient. In PCA and RDA, we have several alternative scaling options: see the long description of scaling and correlation in ?scores.cca. The default (correlation = FALSE) scaling makes arrows proportional to absolute change in species abundance. That is, abundant species can change a lot and can have long arrows, but scarce species can change only a bit and have always short arrows. It is absolute change, not relative change. With correlation = TRUE, the arrow lengths are proportional to relative change and will be similar to scaling by correlations used in envfit. Again, study the manual for details (?scores.cca).