I have a large scipy sparse matrix X
.
I have a vector, y
with the number of elements matches the number of rows of X
.
I want to calculate the sum of of each column after it was multiplies by y
.
If X
was dense, it is equivalent of np.sum(X * y, axis=0)
.
How can it be done efficiently for a sparse matrix?
I tried:
z = np.zeros(X.shape[1])
for i in range(X.shape[1]):
z[i] = np.sum(np.array(X[:, i]) * y)
Yet it was terribly slow.
Is there a better way to achieve this?
Use dot
product provided for sparse matrices:
X.transpose().dot(y)
This should be faster.
Also note that you cannot index a sparse matrix as you wrote in your example. You need to use getcol
method.