Say I have a Numpy array of N = 10
random float numbers:
import numpy as np
np.random.seed(99)
N = 10
arr = np.random.uniform(0., 10., size=(N,))
print(arr)
out[1]: [6.72278559 4.88078399 8.25495174 0.31446388 8.08049963
5.6561742 2.97622499 0.46695721 9.90627399 0.06825733]
I want to find all unique pairs of numbers that are not different from each other more than a tolerance tol = 1.
(i.e. absolute difference <= 1). Specifically, I want to get all unique pairs of indexes. The indexes of each close pair should be sorted, and all close pairs should be sorted by the first index. I managed to write the following working code:
def all_close_pairs(arr, tol=1.):
res = set()
for i, x1 in enumerate(arr):
for j, x2 in enumerate(arr):
if i == j:
continue
if np.isclose(x1, x2, rtol=0., atol=tol):
res.add(tuple(sorted([i, j])))
res = np.array(list(res))
return res[res[:,0].argsort()]
print(all_close_pairs(arr, tol=1.))
out[2]: [[1 5]
[2 4]
[3 7]
[3 9]
[7 9]]
However, in reality I have an array of N = 1000
numbers, and my code becomes extremely slow due to the nested for loops. I believe there are much more efficient ways to do this with Numpy vectorization. Does anyone know the fastest way to do this in Numpy?
This is a solution with pure numpy operations. It seems pretty fast on my machine, but I don't know what kind of speed we're looking for.
def all_close_pairs(arr, tol=1.):
N = arr.shape[0]
# get indices in the array to consider using meshgrid
pair_coords = np.array(np.meshgrid(np.arange(N), np.arange(N))).T
# filter out pairs so we get indices in increasing order
pair_coords = pair_coords[pair_coords[:, :, 0] < pair_coords[:, :, 1]]
# compare indices in your array for closeness
is_close = np.isclose(arr[pair_coords[:, 0]], arr[pair_coords[:, 1]], rtol=0, atol=tol)
return pair_coords[is_close, :]