I have a dataset containing a set of variables and the coordinates describing their distributions in geographic space:
set.seed(123)
#example dataset:
d <- data.frame(var=as.factor(rep(LETTERS[1:5],each=6)),x=runif(30),y=runif(30))
head(d)
var x y
1 A 0.2875775 0.96302423
2 A 0.7883051 0.90229905
3 A 0.4089769 0.69070528
4 A 0.8830174 0.79546742
5 A 0.9404673 0.02461368
6 A 0.0455565 0.47779597
I would like to measure Bhattacharyya's affinity for each combination of variables, as in the following:
library(dplyr)
library(adehabitatHR)
a <- d %>%
filter(var %in% c("A","B")) %>%
dplyr::select(x,y)
b <- d %>%
filter(var %in% c("A","B")) %>%
dplyr::select(var)
sp_df <- SpatialPointsDataFrame(a, b)
kerneloverlap(sp_df, method='BA')[1,2]
[1] 0.7217199
The final goal is to store these values in a symmetric matrix and use them as a distance metric of sorts between the variables.
Unfortunately, the kerneloverlap()
function only works with a SpatialPointsDataFrame
object and can only handle two variables at a time, so I have tried baking it into a loop following this post:
distmat <- as.data.frame(matrix(ncol=5,nrow=5))
colnames(distmat) <- levels(d$var)
rownames(distmat) <- levels(d$var)
for (i in seq_along(levels(d$var))) {
if(i != length(levels(d$var))){
a <- d %>%
filter(var %in% c(levels(d$var)[i], levels(d$var)[i+1])) %>%
dplyr::select(x,y)
b <- d %>%
filter(var %in% c(levels(d$var)[i], levels(d$var)[i+1])) %>%
dplyr::select(var)
sp_df <- SpatialPointsDataFrame(a, b)
distmat [i,(i+1)] <- kerneloverlap(sp_df, method='BA')[1,2]
}
}
However, when I run this it gives back Error in kernelUD(xy, same4all = TRUE, ...) : At least 5 relocations are required to fit an home range
. This is because for the kerneloverlap() function to work there needs to be at least five observations in both distributions; however, every variable in the example dataset has 6 observations, so this shouldn't be a problem. I found out this error doesn't happen if var
is not a factor but a character vector, but then of course the rest of the function doesn't work and the distance matrix stays empty.
I really am stuck and don't know where to go from here, so any suggestion is very much appreciated.
EDIT
I found a solution to iterate with combn
:
combos =as.data.frame(combn(unique(d$var),2))
distmat <- as.data.frame(matrix(ncol=5,nrow=5))
for (i in 1:ncol(combos)) {
a <- d %>%
filter(var %in% c(combos[1:2,i])) %>%
dplyr::select(x,y)
b <- d %>%
filter(var %in% c(combos[1:2,i])) %>%
dplyr::select(var)
sp_df <- SpatialPointsDataFrame(a, b)
kerneloverlap(sp_df, method='BA')[1,2] %>% print()
}
This correctly prints out the values of Bhattacharyya's affinity, however I am still trying to figure out how to save these into a symmetric matrix with dimensions equal to the number of variables,such that they correspond to the right pair. Any ideas? Thanks in advance.
After a lot of trial and error I ended up with this:
Function:
for (i in 1:ncol(combos)) {
a <- d %>%
filter(var %in% c(combos[1:2,i])) %>%
dplyr::select(x,y)
b <- d %>%
filter(var %in% c(combos[1:2,i])) %>%
dplyr::select(var)
sp_df <- SpatialPointsDataFrame(a, b)
#append to combos a row with the values for the corresponding pairs:
combos[3,i] <- round(kerneloverlap(sp_df, method='BA')[1,2],3)
}
Reshape combos dataframe
diff <- as.data.frame(t(comb)) %>%
pivot_wider(names_from = 2,values_from = 3,values_fill = NA) %>%
tibble::column_to_rownames('1') %>%
as.matrix()
NOTE: this last passage is problematic, since the column and row names will be missing the first and last letter, respectively, so the matrix is NOT symmetric. I don't know how to solve this, and it required me to save it to a csv file and manually add the missing column and row. Since my original data is not very large, this wasn't too much of a hassle, but I would like to fix it anyway.
Make matrix symmetric
bhatt <- read.csv("bhatt.csv") #cleaned up version of the matrix with only the upper triangle filled up.
bhatt[lower.tri(bhatt,diag=F)] <- t(bhatt)[lower.tri(bhatt,diag=F)]
This still needs a function to subtract the values in the matrix from 1 to make it a real distance matrix, but that goes beyond the scope of this post. The solution worked for me, but I feel it's way too hacky and could be done better, without resorting to manually fixing the dataset. If anyone knows how, please let me know.