rdata.table

Applying a match within a function for a data.table


I have a data.table that contains for each ID, the X and Y coordinates and a number of columns that contain neighbouring ID's. The neighbouring ID's refer to other observations/rows in this DT.

The goal here is to set certain ID's to NA, if their distance is too large. To do this, I have to apply my distance function to the X and Y coordinates of the matching ID in the DT.

#the update function
update_columns <- function(dt, columns_to_update) {
   for (col in columns_to_update) {
      dt[, (col) := ifelse(chebyshev_distance(x,y, dt.nearest_neighbours[match.SD, c("x", "y"), on="id"]) > 10, NA, dt[[col]])]
   }
   return(dt.nearest_neighbours)
}

#the chebyshev distance function
chebyshev_distance <- function(x1, y1, data) {
   pmax(abs(x1-data$x), abs(y1-data$y))
}


#I created a mock data.table for reproducing the problem:

dt.nearest_neighbours <- data.table(
  id = c(1,2,3,4), #the ID
  x = c(10, 20, 30, 40), #the X coordinate of the ID
  y = c(5, 10,25,5), #the Y coordinate of the ID
  V1 = c(2,3,2,1), #a neighbour of the ID -> the numbers in the V columns refer to other ID's in this dt
  V2 = c(4,1,4,2), #a second neighbour of the ID
  V3 = c(3,1,1,3) #third neighbour of the ID
)

This current code gives the following error:

'match.SD is not found in calling scope and it is not a column name either. When the first argument inside DT[...] is a single symbol (e.g., DT[var]), data.table looks for var in calling scope'.

It seems the match is not done properly. How can I fix this?


Solution

  • This was solved with the following code:

    update_columns <- function(dt, columns_to_update) {
      for (col in columns_to_update) {
        dt[, (col) := ifelse(chebyshev_distance(x, y, dt[.(id = get(col)), .(id, x, y), on = "id"]) > 10, NA, dt[[col]])]
      }
      return(dt)
    }