algorithmnearest-neighborapproximate-nn-searching

Two sets of high dimensional points: Find the nearest neighbour in the other set


I have 2 sets: A and B. Both sets contain the same number of high dimensional points. How do I find the nearest neighbour in Set A for every point in Set B?

I thought about using a Voronoi diagram but it seems (according to wikipedia) that it is not suitable for dimensions higher than 2.

Can someone suggest a method to me, please?


Solution

  • FLANN

    If your data do really lie in a high dimensional space, then you could use FLANN.

    It actually builds a number of rotated kd-trees and queries (a bit) every single tree, keeping the best results found. It also rotates the data-set to avoid nasty cases.

    In the publications section you can read more on how it works.

    In Getting FLANN section you can also read the manual.

    However, since you wish to perform Nearest Neighbour Searching(NNS) in a high dimensional space, you need to accept the trade-off between time and accuracy (more time comes with more accuracy). That's why FLANN performs approximate NNS (check this answer for more).


    LSH

    As an alternative, I would suggest the LSH algorithm. Here is E²LSH, which actually implements LSH algorithm. The manual can be found here.

    The idea behind the algorithm is that we want the points that lie near to each other to be placed (with a high probability) in the same bucket. However, LSH is devoted in solving the R nearest neighbour problem.

    By R-near neighbour data structure, the author probably means that given a query point q, we can answer this question: "Which points of the dataset lie inside radius R from q?".

    However, the manual explains how can LSH be used to perform NN searching.


    Note that this type of questions are not for this site. I answered to you because you are a new user. Next time make sure you don't forget that. :)