I have a dataframe with number of times a bird species was detected at a site (n=110) during two separate visits:
Site Yr Total.ACFL Total.AMCR Total.AMGO Total.AMKE Total.AMRE
<chr> <int> <int> <int> <int> <int> <int>
1 BRY1 2021 0 0 0 0 0
2 BRY2 2021 0 0 3 0 0
3 BRY2 2021 1 0 2 0 0
4 BRY3 2021 0 0 2 0 0
5 BRY3 2021 0 0 7 0 0
6 BRY4 2021 0 0 1 0 0
7 BRY4 2021 0 0 1 0 0
8 BRY5 2021 0 0 4 0 0
9 BRY5 2021 0 0 4 0 0
10 BRY6 2021 0 0 8 0 0
There are a total of 56 species (columns that start with Total.). What I want to do is determine the total species richness (number of unique species detected) at each site. I want to produce a script that does this by determining the number of columns (e.g. Total.ACFL) that contain at least one detection per site. For example, the site BRY2 would have a species richness of 2 since AMGO and ACFL were detected during the two visits.
Any help would be greatly appreciated.
I have not tried anything as of yet (at a bit of a loss, frankly), but I am hoping to be steered in the proper direction. Assuming I might use the vegan package to my advantage.
You can first reshape your dataframe into a "long" format for easier manipulation, then count the number of species having the value > 0.
library(dplyr)
df |>
pivot_longer(starts_with("Total")) |>
summarize(value = sum(value), .by = c(Site, Yr, name)) |>
summarize(count = sum(value > 0), .by = c(Site, Yr))
Another method using reframe()
. This does not require reshaping the dataframe.
df |>
summarize(across(starts_with("Total"), sum), .by = c(Site, Yr)) |>
reframe(count = rowSums(pick(starts_with("Total")) > 0), .by = c(Site, Yr))
They have the same output:
# A tibble: 6 × 3
Site Yr count
<chr> <int> <dbl>
1 BRY1 2021 0
2 BRY2 2021 2
3 BRY3 2021 1
4 BRY4 2021 1
5 BRY5 2021 1
6 BRY6 2021 1