I'm using hclust to perform a cluster analysis of plant species cover data across sampling sites.
My study observed percent cover of 55 species at 100 sites. Plant cover at each site was measured in cover classes of 0-4, where 0 is absent, '1' is 1-25% cover ...'4' is 76-100% cover.
I'm using Euclidian distance to measure species cover dissimilarity between sites, and I want to know which plant species is driving the grouping of each branch of the dendrogram. See sample df & code below; each row represents a site.
In the simplified example, I can see that sp1 is driving the association of sites 3 & 4. In my very large dataset, how could I determine which species is/are driving the associations at different levels of my dendrogram?
Please let me know if I can clarify. Thanks for your help!
library(tidyverse)
site <- c(1:10)
sp1 <- c(0,1,4,4,3,3,2,1,0,2)
sp2 <- c(4,3,0,0,2,2,3,2,1,3)
sp3 <- c(3,2,1,1,2,2,3,2,1,3)
sp4 <- c(2,4,1,0,1,2,3,4,3,1)
df <- data.frame(site, sp1, sp2, sp3, sp4)
species <- select(df, sp1:sp4)
dend <- species %>%
dist(method = "euclidean") %>%
hclust(method = "ward.D") %>%
as.dendrogram()
plot(dend, ylab = "Euclidan Distance")
Following up: I ended up assigning the sites in each cluster to an arbitrary Association group, and then running an indicator species analysis on the Association group using the multipatt function from indicspecies. This allowed me to identify the species that were significantly driving the clustering of the different groups.
clusters <- df %>% mutate(Association =
case_when(site %in% c(3, 4)~1,
site %in% c(2, 8, 9)~2,
site %in% c(1, 5, 6, 7, 10)~3))
abundance = clusters[2:5]
association = clusters$Association
indicator_r.g = multipatt(abundance, association, func = "r.g", control = how(nperm=9999))
summary(indicator_r.g)
Multilevel pattern analysis
---------------------------
Association function: r.g
Significance level (alpha): 0.05
Total number of species: 4
Selected number of species: 4
Number of species associated to 1 group: 3
Number of species associated to 2 groups: 1
List of species associated to each combination:
Group 1 #sps. 1
stat p.value
sp1 0.82 0.0193 *
Group 2 #sps. 1
stat p.value
sp4 0.832 0.0161 *
Group 3 #sps. 1
stat p.value
sp3 0.781 0.0317 *
Group 2+3 #sps. 1
stat p.value
sp2 0.844 0.0293 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1