I have a data.table that contains for each ID, the X and Y coordinates and a number of columns that contain neighbouring ID's. The neighbouring ID's refer to other observations/rows in this DT.
The goal here is to set certain ID's to NA, if their distance is too large. To do this, I have to apply my distance function to the X and Y coordinates of the matching ID in the DT.
#the update function
update_columns <- function(dt, columns_to_update) {
for (col in columns_to_update) {
dt[, (col) := ifelse(chebyshev_distance(x,y, dt.nearest_neighbours[match.SD, c("x", "y"), on="id"]) > 10, NA, dt[[col]])]
}
return(dt.nearest_neighbours)
}
#the chebyshev distance function
chebyshev_distance <- function(x1, y1, data) {
pmax(abs(x1-data$x), abs(y1-data$y))
}
#I created a mock data.table for reproducing the problem:
dt.nearest_neighbours <- data.table(
id = c(1,2,3,4), #the ID
x = c(10, 20, 30, 40), #the X coordinate of the ID
y = c(5, 10,25,5), #the Y coordinate of the ID
V1 = c(2,3,2,1), #a neighbour of the ID -> the numbers in the V columns refer to other ID's in this dt
V2 = c(4,1,4,2), #a second neighbour of the ID
V3 = c(3,1,1,3) #third neighbour of the ID
)
This current code gives the following error:
'match.SD is not found in calling scope and it is not a column name either. When the first argument inside DT[...] is a single symbol (e.g., DT[var]), data.table looks for var in calling scope'.
It seems the match is not done properly. How can I fix this?
This was solved with the following code:
update_columns <- function(dt, columns_to_update) {
for (col in columns_to_update) {
dt[, (col) := ifelse(chebyshev_distance(x, y, dt[.(id = get(col)), .(id, x, y), on = "id"]) > 10, NA, dt[[col]])]
}
return(dt)
}