stringalgorithmclosest-points

How to find two strings that are close to each other


I have n strings and I want to find the closest pair.

What's the fastest practical algorithm to find this pair?


Solution

  • The paper "The Closest Pair Problem under the Hamming Metric", Min, Kao, Zhu seems to be what you are looking for, and it applies to finding a single closest pair.

    For your case, where n0.294 < D < n, where D is the dimensionality of your data (1000) and n the size of your dataset, the algorithm will run in O(n1.843 D0.533).