To illustrate the problem I am facing, here is some example code:
a = np.round(np.random.rand(10, 15))
counta = np.count_nonzero(a, axis=-1)
print(counta)
A = np.einsum('im,mj->ijm', a, a.T)
countA = np.count_nonzero(A, axis=-1)
print(countA)
It creates a 2D array and counts its nonzero elements along the last axis. Then it creates a 3D array, of which the nonzero elements are again counted along the last axis.
My problem is that my array a
is so large, that I can perform the first step, but not the second step, since the A
array would take up too much memory.
Is there any way to still get countA
? That is to count the zeros in A along a given axis without actually creating the array?
I think you can simply use a matrix multiply (dot product) to get your result, no need to generate that massive 3D array A
:
a = np.round(np.random.rand(10, 15)).astype(int)
counta = np.count_nonzero(a, axis=-1)
A = np.einsum('im,mj->ijm', a, a.T)
countA = np.count_nonzero(A, axis=-1)
assert np.all(countA == (a @ a.T))
This is also much faster:
a = np.round(np.random.rand(1000, 1500)).astype(int)
%timeit np.count_nonzero(np.einsum('im,mj->ijm', a, a.T), axis=-1)
3.94 s ± 38.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit a @ a.T
558 ms ± 6.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Also note that your first step is redundant with the second:
assert np.all(counta == np.diag(countA))