pythonspatialknn

Spatial data structures with efficient dynamic updates


I am looking for a python library which does approximate (or exact, no problem with that) nearest neighbor search which has fast dynamic updates. (it seems that the scipy.spatial ones, like KDTree or BallTree do not. It would be even better if this were GPU compatible, but we should not be too greedy.


Solution

  • Have a look at https://github.com/erikbern/ann-benchmarks I don't really know the project, but it appears to written in Python and implements all kind of (A)NN search indices. However, I assume many of them are not written in Python but other languages.

    Unfortunately, I think this project does not benchmark updateability.

    Personally, I know next to nothing about ANN, so I am going to focus on exact NN. The best index for you situation depends on several factors: do you dynamically only add or also remove entries? How many dimension do you have? How big is your dataset?

    Tree types:

    If you want to quickly add AND remove entries, I suggest looking at LSH (locality sensitive hashing), R-Trees, and quadtrees.

    If you are interested, I compared numerous indexes (Java implementations) here. Figure 32-35 show kNN performance for 1NN and 10NN queries with 1'000'000 entries up to 40 dimensions. CU-P is a dataset with uniformly distributed points, CL-P is a highly clustered dataset. Fig. 42 and 44 show update rates (removing a point and inserting another point somewhere else).

    Implementations of the indexes are available here (Java), the PH-tree is also available in C++.