rgisgeospatialr-spgeos

For each point in one data set, calculate distance to nearest point in second data set


Trying to find, for each point in a SpatialPointsDataFrame, the distance to the closest point in a second SpatialPointsDataFrame (equivalent to the "nearest" tool in ArcGIS for two SpatialPointDataFrames).

I can do the naive implementation by calculating all pairwise distances using gDistance and taking the min (like answer 1 here), but I have some huge datasets and was looking for something more efficient.

For example, here's a trick with knearneigh for points in same dataset.

Cross-posted on r-sig-geo


Solution

  • The SearchTrees package offers one solution. Quoting from its documentation, it, "provides an implementation of the QuadTree data structure [which it] uses to implement fast k-Nearest Neighbor [...] lookups in two dimensions."

    Here's how you could use it to quickly find, for each point in a SpatialPoints object b, the two nearest points in a second SpatialPoints object B

    library(sp)
    library(SearchTrees)
    
    ## Example data
    set.seed(1)
    A <- SpatialPoints(cbind(x=rnorm(100), y=rnorm(100)))
    B <- SpatialPoints(cbind(x=c(-1, 0, 1), y=c(1, 0, -1)))
    
    ## Find indices of the two nearest points in A to each of the points in B
    tree <- createTree(coordinates(A))
    inds <- knnLookup(tree, newdat=coordinates(B), k=2)
    
    ## Show that it worked
    plot(A, pch=1, cex=1.2)
    points(B, col=c("blue", "red", "green"), pch=17, cex=1.5)
    ## Plot two nearest neigbors
    points(A[inds[1,],], pch=16, col=adjustcolor("blue", alpha=0.7))
    points(A[inds[2,],], pch=16, col=adjustcolor("red", alpha=0.7))
    points(A[inds[3,],], pch=16, col=adjustcolor("green", alpha=0.7))
    

    enter image description here