I have difficulties to apply the estimateD function in the iNEXT package to my own data. I am working on bees and I have a very large dataset of count of records in grid cells covering a particular region. I want to compute Hill diversities for each of my grid cells by rarefying by size (and also by coverage, both methods don't work on my own data but here I report the error I got with the base="size" argument)
In order to use the function, I have used a species x sites (=grid cells) matrix and transformed it into a list as in the reprex of the function:
library(iNEXT)
data(spider)
iNEXT::estimateD(spider, datatype="abundance", base="size", level=NULL, conf=NULL)
I have data for 656 bee species for 4428 grid cells. When running the function on the all data, I got the following error: Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows
But when subsetting the list with much smaller number of grid cells the function may success. Here is a reprex. The reprex contains 67 different grid cells. I must apologize but it's the smallest subset for which I get an error.
List1=list(col1 = c(4, 2, 1, 1, 1, 2, 3, 2, 2, 2, 1), col2 = c(1, 3,3, 3, 3, 3, 3, 4, 3, 3, 3, 1, 3, 3, 3),
col3 = c(3, 6, 2, 1,7, 7, 5), col4 = c(2, 4, 2, 3, 3, 2, 4, 5, 5, 4, 3, 5, 3),
col5 = c(6,1, 3, 4, 2, 2, 2), col6 = c(4, 8, 1, 1, 4, 1, 8, 9, 5, 2, 9,7, 1, 11, 4, 1, 2),
col7 = c(1, 1, 2, 1, 2, 3, 7, 8, 5, 6, 6, 4, 10, 1, 1, 1), col8 = c(2, 1, 3, 1, 1, 1, 1, 2, 2, 1, 2, 2,2, 2, 1, 2, 1),
col9 = c(2, 4, 4, 3, 3, 3, 2, 2, 5), col10 = c(3,2, 2, 2, 5, 4, 4, 5, 1),
col11 = c(4, 2, 2, 4, 3, 2, 4, 4, 2, 4, 1), col12 = c(1, 1, 3, 1, 3, 2, 1, 5, 6, 2, 5),
col13 = c(2,4, 2, 1, 1, 5, 1, 2, 4, 2, 3, 1, 2, 1, 1),
col14 = c(3, 2, 14,31, 8, 3, 1, 7, 5, 21, 6, 21, 43, 26, 2, 33, 16, 20, 7, 3, 18, 2, 1, 1),
col15 = c(2, 2, 10, 2, 3, 2, 5, 2, 9, 1, 8, 6, 7, 3, 7, 1, 2, 2, 5, 1, 1, 1, 1, 3, 1, 3),
col16 = c(4, 1, 1, 1, 4,3, 1, 1, 3, 1, 1),
col17 = c(4, 8, 1, 1, 1, 1, 1, 2, 1), col18 = c(3,2, 1, 2, 1, 1, 1, 1, 3, 3, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1),
col19 = c(4,4, 4, 9, 2, 7, 6, 2, 9), col20 = c(3, 4, 1, 2, 5, 4, 1, 1, 2),
col21 = c(2, 2, 2, 1, 1, 3, 1, 2, 1, 1, 2, 2, 1, 1), col22 = c(2, 7, 1, 1, 2, 2, 5, 3, 3, 1, 1, 4, 2),
col23 = c(1, 5, 1, 1,3, 2, 1, 1, 1, 2, 2, 2, 3, 1, 2, 1, 2, 2, 1, 1),
col24 = c(7, 1, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1), col25 = c(3, 3, 1, 3, 3,3, 3, 1, 2, 4, 3, 5),
col26 = c(11, 2, 1, 7, 5, 8, 11), col27 = c(3,4, 10, 1, 10, 3, 9), col28 = c(4, 1, 1, 4, 1, 2, 3, 1, 3),
col29 = c(3, 1, 1, 2, 3, 4, 2, 2, 4, 5, 10, 1, 6, 2, 6, 1,6, 8, 11, 1, 1, 1, 1),
col30 = c(5, 1, 2, 2, 3, 3, 3, 1,2, 2, 2, 1, 1, 1, 3, 1, 1),
col31 = c(2, 5, 4, 2, 1, 1, 1,1, 1, 1, 1, 2, 3, 1, 1, 1),
col32 = c(2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
col33 = c(4, 1, 2,2, 5, 3, 2, 6, 7, 2, 3, 5),
col34 = c(1, 3, 1, 3, 3, 7, 1,1, 2, 2), col35 = c(1, 7, 6, 2, 7, 12, 2, 2, 3, 3, 2, 7),
col36 = c(6, 1, 3, 1, 14, 3, 2, 4, 1), col37 = c(5, 2, 1,1, 2, 1, 1, 2, 3, 1, 1, 3, 1),
col38 = c(1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), col39 = c(4, 2,2, 1, 2, 4, 2, 2, 2),
col40 = c(2, 3, 1, 3, 4, 4, 1, 3, 4,1), col41 = c(2, 1, 2, 2, 2, 4, 5, 6, 6, 13, 7, 10, 3, 8,1, 1, 1, 1, 1),
col42 = c(4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 3),
col43 = c(3, 1, 2, 1, 3, 2, 3, 2, 3, 2), col44 = c(3,1, 1, 1, 5, 3, 1, 3, 2, 1),
col45 = c(3, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 2, 5, 1, 2, 2), col46 = c(1, 5, 4, 1, 1, 2, 1,2, 2, 1, 5, 3, 2, 4, 2, 1, 2, 2),
col47 = c(3, 3, 2, 1, 2, 1, 2, 4, 1, 2, 1), col48 = c(2, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 2, 2, 5, 1, 3, 6),
col49 = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 3, 1),
col50 = c(4, 1, 5, 1, 5, 4, 6, 9, 5, 10, 14, 2, 4, 6, 4), col51 = c(1,3, 2, 3, 5, 4, 2, 1, 3, 2),
col52 = c(1, 3, 2, 3, 3, 2, 3,2, 2, 2, 1, 2, 4), col53 = c(3, 1, 2, 3, 2, 4, 2, 3, 2, 2),
col54 = c(7, 6, 6, 7, 3, 1, 1, 3, 1, 1),
col55 = c(4, 1, 3, 4, 2, 1, 2, 4, 1, 4, 4, 4, 1), col56 = c(2, 3, 3, 1,3, 4, 2, 2, 2, 4),
col57 = c(1, 1, 4, 6, 2, 7, 4, 3, 10,7, 3, 1, 9, 3), col58 = c(5, 5, 1, 1, 3, 3, 3, 4, 2, 2, 2),
col59 = c(8, 8, 2, 3, 2, 2, 2, 1, 3, 1, 2, 2, 2, 1), col60 = c(4,1, 2, 6, 3, 2, 1, 2, 3, 1, 2, 3, 2, 1),
col61 = c(6, 2, 2,2, 4, 2, 2, 5, 5, 1, 2, 6, 2, 1),
col62 = c(7, 5, 8, 3, 1,2, 2, 2, 2, 1, 3, 1, 1, 1, 3, 1, 3, 2, 3, 1, 1, 3, 1, 1),
col63 = c(2, 3, 3, 1, 3, 1, 4, 1, 4, 3, 2, 4), col64 = c(3,2, 3, 2, 2, 5, 2, 3, 6, 1, 6, 5, 2, 6, 1, 3),
col65 = c(1, 2, 2, 1, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1),
col66 = c(3, 2,1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 3, 5, 1, 1, 4, 1,1, 1, 3, 1),
col67 = c(4, 3, 2, 2, 1, 1, 1, 2, 2, 1, 2))
by_size <- iNEXT::estimateD(List1,
datatype = "abundance", base = "size",
level=NULL, conf=NULL)
#Error in data.frame(..., check.names = FALSE) :
#arguments imply differing number of rows: 67, 66
The list that I provided doesn't have zeros so every grid cell doesn't have the same number of species but the same error appears with each grid cell having the same number of species (with zeros). I did that in order to have the smallest reprex as possible.
Now if we reduce the list by removing one grid cell (or more), the function works:
List2=List1[1:66]
by_size2 <- iNEXT::estimateD(List2,
datatype = "abundance", base = "size",
level=NULL, conf=NULL)
I am just trying to understand why it produces such error. If any of you has already faced this problem, please let me know. I would be delighted to have at least a suggestion on how to proceed or an explanation why it's not working.
Thank you very much in advance!
OK, I think there are two things going on in iNEXT::estimateD
that contribute to this frustrating behavior. The first is the filtering of duplicates (which I don't understand the purpose of). The second is name matching, which is the the thing that @JensÅström mentioned.
In your dataset, the first and last rows have the same abundances:
all.equal(sort(List1[[1]]), sort(List1[[67]]))
This means that when iNEXT::estimateD
filters out duplicates with the line of code tmp <- tmp[!duplicated(tmp), ]
, the 67th element is removed. I don't yet see why this is desireable and I feel like it should have an associated warning... there is nothing inherently weird about having multiple samples with the same abundances. But actually, without the name matching code afterwards, this subsetting would happen silently.
# L3 is an MRE
L3<-List1[c(67,1)]
by_size <- iNEXT::estimateD(L3,
datatype = "abundance", base = "size",
level=20, conf=NULL)
# dropping names changes the behavior still. why?
L4<-L3
names(L4)<-NULL
by_size <- iNEXT::estimateD(L4,
datatype = "abundance", base = "size",
level=20, conf=NULL)
duplicated(L3)
duplicated(L4)
# ok, so it's not exactly with the behavior of `duplicated` that things go awry
# (but weird that the 2nd duplicate is dropped...)
The error, per se occurs with iNEXT::estimateD
's approach to returning the names
from your input as a column. Last bit of the function:
nam <- names(x)
if (is.null(nam)) {
tmp
}
else if (ncol(tmp) == 6) {
tmp <- cbind(site = nam, tmp)
}
else {
tmp <- cbind(site = rep(nam, each = 3), tmp)
}
rownames(tmp) <- NULL
tmp
When you cbind
the names (nam
) with the output (which has had that duplicate dropped for unknown reasons), you get that error; nam
and tmp
have different lengths.