I'd like to perform a matrix product of two sparse SciPy matrices. However, the result is not sparse, in my case, so I'd like to store it as a dense NumPy array.
Is it possible to do this efficiently, that is without creating a "sparse" matrix first and then converting it? I'm free to choose any input format (whichever is more efficient).
An example: the product of two 10000x10000 99% sparse matrices with randomly distributed zeros will be dense:
n = 10_000
a = np.random.randn(n, n) * (np.random.randint(100, size=(n, n)) == 0)
b = np.random.randn(n, n) * (np.random.randint(100, size=(n, n)) == 0)
c = a.dot(b)
np.count_nonzero(c) / c.size # should be 0.63
import numpy as np
from scipy import sparse
n = 10_000
a = sparse.csr_matrix(np.random.randn(n, n) * (np.random.randint(100, size=(n, n)) == 0))
b = sparse.csr_matrix(np.random.randn(n, n) * (np.random.randint(100, size=(n, n)) == 0))
c = a.dot(b)
>>> c
<10000x10000 sparse matrix of type '<class 'numpy.float64'>'
with 63132806 stored elements in Compressed Sparse Row format>
Yeah, this is a really inefficient way to store this matrix. There's no way in scipy to go straight to dense though. You can use the sparseBLAS functions that go straight to dense (which exist for the use case you are describing).
There's a python wrapper for MKL that I use for this, which wraps mkl_sparse_spmmd:
from sparse_dot_mkl import dot_product_mkl
c_dense = dot_product_mkl(a, b, dense=True)
>>>> np.sum(c_dense != 0)
63132806
It's also threaded so it's a lot faster than scipy. Getting MKL installed is left to the reader (conda install -c intel mkl
is probably easiest though)