so I need to calculate alpha diversity (simpson) for a database I got for the fish found in six different sites (12 entries, 6 columns) . When I use the diversity() function in the vegan package I get an error as it calculates each of the 12 entries as simpson for the species, instead of using the six sites. I even compared with the dune database in the vegan package and I can't just figure out what I'm doing wrong.
This is my code so far:
if (!require(ggplot2)) install.packages('ggplot2')
library(ggplot2)
if (!require(ggpubr)) install.packages('ggpubr')
library(ggpubr)
if (!require(tidyverse)) install.packages('tidyverse')
library(tidyverse)
if (!require(vegan)) install.packages('vegan')
library(vegan)
if (!require(labdsv)) install.packages('labdsv')
library(labdsv)
Abundancia2017<- c(0,36,0,9,0,0,46,0,0,0,25,0)
Abundancia2018<- c(0,46,0,13,5,0,69,2,0,0,123,9)
Abundancia2019<- c(4,20,0,38,2,1,97,0,0,0,12,0)
datafish <- data.frame(Abundancia2017,Abundancia2018, Abundancia2019) datafish
I run the diversity function and I get the following result:
> diversity(datafish,index = "simpson")
[1] 0.0000000 0.6336025 1.0000000 0.5294444 0.4081633 0.0000000 0.6376379 0.0000000 1.0000000 1.0000000
[11] 0.3789844 0.0000000
Please help me, I've already tried swapping columns and rows in the.csv document and nothing seems to work
Vegan follows convention and expects your samples to be in the rows and your species to be in the columns. Your example is transposed relative to this convention. If this reproducible example is representative of your actual data, then you have the data stored incorrectly for {vegan} (and most stats). This problem is very common with people coming from the microbiome world where they store the species (OTUs or ASVs, etc) in the rows and the samples in the columns.
You could just do t()
on the data frame to transpose it, but it might be generally safer to comvert to a data matrix, then transpose, then convert to a data frame:
library("dplyr")
df <- data.matrix(datafish) %>%
t() %>%
as.data.frame()
which gives
> df
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
Abundancia2017 0 36 0 9 0 0 46 0 0 0 25 0
Abundancia2018 0 46 0 13 5 0 69 2 0 0 123 9
Abundancia2019 4 20 0 38 2 1 97 0 0 0 12 0
and which can now be used to compute the diversity:
diversity(df,index = "simpson")
which produces
> diversity(df,index = "simpson")
Abundancia2017 Abundancia2018 Abundancia2019
0.6939655 0.6873992 0.6228696
In this case however, there is a shortcut. diversity()
has a MARGIN
argument, which you can use to tell it to work row-wise or column-wise. The default is to work over rows (to get sample diversity values), but in your case, you want it to work over the columns as that is where you have your samples. Hence we want to add MARGIN = 2
to your diversity
call:
# note I use your datafish here
diversity(datafish,index = "simpson", MARGIN = 2)
which gives:
> diversity(datafish,index = "simpson", MARGIN = 2)
Abundancia2017 Abundancia2018 Abundancia2019
0.6939655 0.6873992 0.6228696