pythonnumpyperformancescipysparse-matrix

Sum each column of a sparse matrix multiplied by a vector


I have a large scipy sparse matrix X.
I have a vector, y with the number of elements matches the number of rows of X.

I want to calculate the sum of of each column after it was multiplies by y.
If X was dense, it is equivalent of np.sum(X * y, axis=0).

How can it be done efficiently for a sparse matrix?

I tried:

z = np.zeros(X.shape[1])

for i in range(X.shape[1]):
  z[i] = np.sum(np.array(X[:, i]) * y)

Yet it was terribly slow.
Is there a better way to achieve this?


Solution

  • Use dot product provided for sparse matrices:

    X.transpose().dot(y) 
    

    This should be faster.

    Also note that you cannot index a sparse matrix as you wrote in your example. You need to use getcol method.