[SOLVED] Number of common zeroes in two sparse arrays of the same size

Number of common zeroes in two sparse arrays of the same size

I have two different scipy sparse arrays of the same shape (actually in my case, one is a row vector and the other is a column vector). I would like to find the number indices where these two arrays have a common 0.

The solution that I currently have is to use the .nonzero attribute of both of the arrays, turn those into sets, take the union of those sets, and use the size of that union together with the dimensions of the arrays to find my answer.

Is there a more efficient way of doing this?

Solution

I believe the simplest and most efficient way would be to add the absolute values of the two arrays (you said they are the same shape so they can be added and the absolute value will avoid issues with negative/positive cancellation), find the nonzero terms using the count_nonzero() method (this gives the number of elements that are nonzero between the two arrays), and then subtract that result from the total number of elements.

from scipy import sparse

N = 500
M = 500
a = sparse.rand(N, M, density=0.2)
b = sparse.rand(N, M, density=0.2)

c = abs(a) + abs(b)
n_common_zeros = N*M - c.count_nonzero()
print(n_common_zeros)

If you don't have N and M saved as variables you can do np.multiply(*a.shape).

One potential issue: if the numbers are very large you may have an overflow issue.

Edit: You can avoid the overflow issue and have slightly faster code if you check that abs(a) and abs(b) are each > 0 before adding them (adding two boolean arrays is faster because the data is smaller, assuming you aren't using int8).

c = (abs(a) > 0) + (abs(b) > 0)