Count the number of columns that have data by group

I have a dataframe with number of times a bird species was detected at a site (n=110) during two separate visits:

   Site     Yr Total.ACFL Total.AMCR Total.AMGO Total.AMKE Total.AMRE
   <chr> <int>      <int>      <int>      <int>      <int>      <int>
 1 BRY1   2021          0          0          0          0          0
 2 BRY2   2021          0          0          3          0          0
 3 BRY2   2021          1          0          2          0          0
 4 BRY3   2021          0          0          2          0          0
 5 BRY3   2021          0          0          7          0          0
 6 BRY4   2021          0          0          1          0          0
 7 BRY4   2021          0          0          1          0          0
 8 BRY5   2021          0          0          4          0          0
 9 BRY5   2021          0          0          4          0          0
10 BRY6   2021          0          0          8          0          0

There are a total of 56 species (columns that start with Total.). What I want to do is determine the total species richness (number of unique species detected) at each site. I want to produce a script that does this by determining the number of columns (e.g. Total.ACFL) that contain at least one detection per site. For example, the site BRY2 would have a species richness of 2 since AMGO and ACFL were detected during the two visits.

Any help would be greatly appreciated.

I have not tried anything as of yet (at a bit of a loss, frankly), but I am hoping to be steered in the proper direction. Assuming I might use the vegan package to my advantage.

Solution

pivot_longer and summarize

You can first reshape your dataframe into a "long" format for easier manipulation, then count the number of species having the value > 0.

library(dplyr)

df |> 
  pivot_longer(starts_with("Total")) |> 
  summarize(value = sum(value), .by = c(Site, Yr, name)) |> 
  summarize(count = sum(value > 0), .by = c(Site, Yr))

summarize and reframe

Another method using reframe(). This does not require reshaping the dataframe.

df |>
  summarize(across(starts_with("Total"), sum), .by = c(Site, Yr)) |>
  reframe(count = rowSums(pick(starts_with("Total")) > 0), .by = c(Site, Yr))

They have the same output:

# A tibble: 6 × 3
  Site     Yr count
  <chr> <int> <dbl>
1 BRY1   2021     0
2 BRY2   2021     2
3 BRY3   2021     1
4 BRY4   2021     1
5 BRY5   2021     1
6 BRY6   2021     1