rfor-loopdplyradehabitathrpairwise-distance

How can I loop a function through every combination of levels of a factor?


I have a dataset containing a set of variables and the coordinates describing their distributions in geographic space:

set.seed(123)

#example dataset:

d <- data.frame(var=as.factor(rep(LETTERS[1:5],each=6)),x=runif(30),y=runif(30))

head(d)

  var         x          y
1   A 0.2875775 0.96302423
2   A 0.7883051 0.90229905
3   A 0.4089769 0.69070528
4   A 0.8830174 0.79546742
5   A 0.9404673 0.02461368
6   A 0.0455565 0.47779597

I would like to measure Bhattacharyya's affinity for each combination of variables, as in the following:

library(dplyr)
library(adehabitatHR)

a <- d %>%
  filter(var %in% c("A","B")) %>%
  dplyr::select(x,y)
b <- d %>%
  filter(var %in% c("A","B")) %>%
  dplyr::select(var)

sp_df <- SpatialPointsDataFrame(a, b)

kerneloverlap(sp_df, method='BA')[1,2]

[1] 0.7217199

The final goal is to store these values in a symmetric matrix and use them as a distance metric of sorts between the variables.

Unfortunately, the kerneloverlap() function only works with a SpatialPointsDataFrame object and can only handle two variables at a time, so I have tried baking it into a loop following this post:

distmat <- as.data.frame(matrix(ncol=5,nrow=5))
colnames(distmat) <- levels(d$var)
rownames(distmat) <- levels(d$var)

for (i in seq_along(levels(d$var))) {
  if(i != length(levels(d$var))){
a <- d %>%
  filter(var %in% c(levels(d$var)[i], levels(d$var)[i+1])) %>%
  dplyr::select(x,y)
b <- d %>%
  filter(var %in% c(levels(d$var)[i], levels(d$var)[i+1])) %>%
  dplyr::select(var)

sp_df <- SpatialPointsDataFrame(a, b)

distmat [i,(i+1)] <- kerneloverlap(sp_df, method='BA')[1,2]
  }
}

However, when I run this it gives back Error in kernelUD(xy, same4all = TRUE, ...) : At least 5 relocations are required to fit an home range. This is because for the kerneloverlap() function to work there needs to be at least five observations in both distributions; however, every variable in the example dataset has 6 observations, so this shouldn't be a problem. I found out this error doesn't happen if var is not a factor but a character vector, but then of course the rest of the function doesn't work and the distance matrix stays empty. I really am stuck and don't know where to go from here, so any suggestion is very much appreciated.

EDIT

I found a solution to iterate with combn:

combos =as.data.frame(combn(unique(d$var),2))
distmat <- as.data.frame(matrix(ncol=5,nrow=5))

for (i in 1:ncol(combos)) {
    a <- d %>%
      filter(var %in% c(combos[1:2,i])) %>%
      dplyr::select(x,y)
    b <- d %>%
      filter(var %in% c(combos[1:2,i])) %>%
      dplyr::select(var)
    
    sp_df <- SpatialPointsDataFrame(a, b)
    
    kerneloverlap(sp_df, method='BA')[1,2] %>% print()
  
}

This correctly prints out the values of Bhattacharyya's affinity, however I am still trying to figure out how to save these into a symmetric matrix with dimensions equal to the number of variables,such that they correspond to the right pair. Any ideas? Thanks in advance.


Solution

  • After a lot of trial and error I ended up with this:

    Function:

    for (i in 1:ncol(combos)) {
        a <- d %>%
          filter(var %in% c(combos[1:2,i])) %>%
          dplyr::select(x,y)
        b <- d %>%
          filter(var %in% c(combos[1:2,i])) %>%
          dplyr::select(var)
        
        sp_df <- SpatialPointsDataFrame(a, b)
    
        #append to combos a row with the values for the corresponding pairs:
        combos[3,i] <- round(kerneloverlap(sp_df, method='BA')[1,2],3) 
    }
    

    Reshape combos dataframe

    diff <- as.data.frame(t(comb)) %>%
      pivot_wider(names_from = 2,values_from = 3,values_fill = NA) %>%
      tibble::column_to_rownames('1') %>%
      as.matrix()
    

    NOTE: this last passage is problematic, since the column and row names will be missing the first and last letter, respectively, so the matrix is NOT symmetric. I don't know how to solve this, and it required me to save it to a csv file and manually add the missing column and row. Since my original data is not very large, this wasn't too much of a hassle, but I would like to fix it anyway.

    Make matrix symmetric

    bhatt <- read.csv("bhatt.csv") #cleaned up version of the matrix with only the upper triangle filled up.
    
    bhatt[lower.tri(bhatt,diag=F)] <- t(bhatt)[lower.tri(bhatt,diag=F)]
    

    This still needs a function to subtract the values in the matrix from 1 to make it a real distance matrix, but that goes beyond the scope of this post. The solution worked for me, but I feel it's way too hacky and could be done better, without resorting to manually fixing the dataset. If anyone knows how, please let me know.