pythonarraysnumpyperformancevectorization

Fastest way to find all pairs of close numbers in a Numpy array


Say I have a Numpy array of N = 10 random float numbers:

import numpy as np
np.random.seed(99)
N = 10
arr = np.random.uniform(0., 10., size=(N,))
print(arr)

out[1]: [6.72278559 4.88078399 8.25495174 0.31446388 8.08049963 
         5.6561742 2.97622499 0.46695721 9.90627399 0.06825733]

I want to find all unique pairs of numbers that are not different from each other more than a tolerance tol = 1. (i.e. absolute difference <= 1). Specifically, I want to get all unique pairs of indexes. The indexes of each close pair should be sorted, and all close pairs should be sorted by the first index. I managed to write the following working code:

def all_close_pairs(arr, tol=1.):
    res = set()
    for i, x1 in enumerate(arr):
        for j, x2 in enumerate(arr):
            if i == j:
                continue
            if np.isclose(x1, x2, rtol=0., atol=tol):
                res.add(tuple(sorted([i, j])))
    res = np.array(list(res))
    return res[res[:,0].argsort()]

print(all_close_pairs(arr, tol=1.))

out[2]: [[1 5]
         [2 4]
         [3 7]
         [3 9]
         [7 9]]

However, in reality I have an array of N = 1000 numbers, and my code becomes extremely slow due to the nested for loops. I believe there are much more efficient ways to do this with Numpy vectorization. Does anyone know the fastest way to do this in Numpy?


Solution

  • This is a solution with pure numpy operations. It seems pretty fast on my machine, but I don't know what kind of speed we're looking for.

    def all_close_pairs(arr, tol=1.):
        N = arr.shape[0]
        # get indices in the array to consider using meshgrid
        pair_coords = np.array(np.meshgrid(np.arange(N), np.arange(N))).T
        # filter out pairs so we get indices in increasing order
        pair_coords = pair_coords[pair_coords[:, :, 0] < pair_coords[:, :, 1]]
        # compare indices in your array for closeness
        is_close = np.isclose(arr[pair_coords[:, 0]], arr[pair_coords[:, 1]], rtol=0, atol=tol)
        return pair_coords[is_close, :]