I want to do quick pair-wise matchings of all elements in a numpy array, ignoring self-matching.
For example given a numpy array like
a = np.array(range(4))
I want to get a matrix like
array([[0, 1],
[0, 2],
[0, 3],
[1, 0],
[1, 2],
[1, 3],
[2, 0],
[2, 1],
[2, 3],
[3, 0],
[3, 1],
[3, 2]])
In my real case, I have different sizes of arrays varying from 1 to 1000. I need to constantly call a function that does this pair-wise matching. I wrote the code below
import numpy as np
from itertools import permutations
a = np.array(range(1000))
xs, ys = zip(*[(x, y) for x, y in permutations(a, 2)])
res = np.vstack([np.array(xs), np.array(ys)]).T
print(res)
I want to know if there are faster ways (maybe more numpyic ways) to do this?
I would like to see some benchmarkings of runtime vs. array size.
Making a meshgrid and then filtering out the "matching" rows (e.g. 0,0
and 12,12
) would likely be quite a bit faster at this scale.
import numpy as np
from itertools import permutations
%%time
n = 1000
a = np.array(np.arange(n))
xs, ys = zip(*[(x, y) for x, y in permutations(a, 2)])
res = np.vstack([np.array(xs), np.array(ys)]).T
res
CPU times: user 1.08 s, sys: 176 ms, total: 1.26 s
Wall time: 1.28 s
%%time
n = 1000
a = np.arange(n)
arr = np.array(np.meshgrid(a, a)).T.reshape(-1,2)
arr = arr[arr[:, 0] != arr[:, 1]]
CPU times: user 49.8 ms, sys: 33.2 ms, total: 83 ms
Wall time: 89.2 ms
%%time
n = 2000
a = np.array(np.arange(n))
xs, ys = zip(*[(x, y) for x, y in permutations(a, 2)])
res = np.vstack([np.array(xs), np.array(ys)]).T
res
CPU times: user 3.94 s, sys: 924 ms, total: 4.86 s
Wall time: 5.03 s
%%time
n = 2000
a = np.arange(n)
arr = np.array(np.meshgrid(a, a)).T.reshape(-1,2)
arr = arr[arr[:, 0] != arr[:, 1]]
CPU times: user 218 ms, sys: 105 ms, total: 323 ms
Wall time: 328 ms
np.testing.assert_array_equal(res, arr) # passes