cluster-computinghierarchical-clusteringveganhclustvegdist

Should I use vegdist or dist for hclust?


I am trying to do a hierarchical cluster analysis with binary data, and for the distance matrix I applied the function vegdist() from vegan package:

dist.jac <- vegdist(final_df, method="jaccard", binary = TRUE)

Next, I used the function hclust but I'm not sure if i should use the dissimilarity matrix produced by vegdist():

jac <- hclust(dist.jac, method = "ward.D")

or trying something like this (as I've seen):

jac <- hclust(dist(dist.jac), method = "ward.D")

because the function hclust() requires 'a dissimilarity structure as produced by dist'. But on the other hand the vegan package says that "should provide a drop-in replacement for dist and return a distance object of the same type" and "the function is an alternative to dist". It does not make much sense to me apply dist(dist.jac) but I've seen doing it so if someone can explain to me which one should I use I would be very grateful!


Solution

  • If somebody is doing dist(dist.jac) they are getting dissimilarities of dissimilarities of data instead of dissimilarities of data. This rarely makes sense.

    Like the vegan documentation says, vegdist is a drop-in replacement of dist. It does not matter which one you use. If a dissimilarity is available only in one alternative, use that. If it is available in both, use either. In that case, I would personally prefer dist as it does not require calling any contributed packages.