rvectornearest-neighborclosest

Finding the closest value in a vector, for each value in the same vector, excluding the element in question in R?


I am trying to find a way to find the closest value in a vector, from elements in the SAME vector, but excluding the row in question. For example, suppose I have dataframe A with one column (column_1):

column_1
1
5
6
2
3
0
5
2
1
9

I want to add a second column which, for every element in column_1 finds the closest value in THAT SAME vector excluding the row in question. Desired output is below:

column_1    column_2
1           1
5           5
6           5
2           2
3           2
0           1
5           5
2           2
1           1
9           6

I have seen people discuss how to do this where the closest value for each element in a vector (a) is identified from another vector (b) via the following:

which(abs(a-b)==min(a-b))

Does anyone know how to modify the above, or do this in some other way, so that I can look within the same vector and exclude the row in question (example: the third row in column_1 is closest to 5 not 6, since I exclude its own row from the search vector. However, the fourth row in column_1 is closest to 2 since even when excluding the fourth row, there is another 2 value in the 8th row)


Solution

  • You could sort the vector, then check only the number before and number after, and replace the original vector:

    y <- sort(x)
    z <- c(-Inf, y, Inf)
    b <- cbind(head(z, -2), tail(z, -2)) 
    x[order(x)] <- b[cbind(seq_along(y), max.col(-abs(b - y)))]
    x
    [1] 1 5 5 2 2 1 5 2 1 6
    

    Notice that this method has the lowest complexity as compared to the above solutions. Ie it is the fastest:

    onyambu <- function(x){
      y <- sort(x)
      z <- c(-Inf, y, Inf)
      b <- cbind(head(z, -2), tail(z, -2)) 
      x[order(x)] <- b[cbind(seq_along(y), max.col(-abs(b - y)))]
      x
    }
    
    gregor <- function(x){
      dist = outer(x, x, FUN = \(a, b) abs(a - b))
      diag(dist) = Inf
      mins = max.col(-dist, ties.method = 'first')
      x[mins]
    }
    x <- rnorm(1000)
    
    microbenchmark::microbenchmark(onyambu(x), gregor(x), check = 'equal', times = 1)
    Unit: milliseconds
           expr       min        lq      mean    median        uq       max neval
     onyambu(x)    5.7839    5.7839    5.7839    5.7839    5.7839    5.7839     1
      gregor(x) 3021.9425 3021.9425 3021.9425 3021.9425 3021.9425 3021.9425     1