I am using scipy sparse matrices to compute choice probabilities used in an estimation. Below is an example. How could it be faster?
Variables and parameters:
import itertools
import numpy as np
import scipy as sp
import timeit
T = 70000
N = 200
p = 212
X = sp.sparse.random(T, (N)*p, density=0.018, format='csr')
B = np.random.rand(p,)
I = sp.sparse.eye(N,format="csr")
Function that computes the choice probabilities:
def csr_example(X,B,I,tr):
tem0 = sp.sparse.kron(I, B, format = "csr").transpose()
exp_xb = X@tem0
exp_xb.data = np.exp( exp_xb.data, out=exp_xb.data)
exp_xb0 = sp.sparse.csr_matrix.sum(exp_xb,axis=1)
exp_xb0 = np.reshape(exp_xb0, (tr, 1))
pro = exp_xb/exp_xb0
return pro
Execution time:
print(timeit.timeit(lambda: csr_example(X,B,I,T), setup="pass",number=10))
Around 15 seconds
Yes, you can make this about 3x faster.
I started my analysis of your program by running it under line profiler.
Timer unit: 1e-09 s
Total time: 0.688771 s
File: /tmp/ipykernel_4391/4040554224.py
Function: csr_example at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def csr_example(X,B,I,tr):
2 1 2630726.0 3e+06 0.4 tem0 = sp.sparse.kron(I, B, format = "csr").transpose()
3 1 574464285.0 6e+08 83.4 exp_xb = X@tem0
4 1 38584926.0 4e+07 5.6 exp_xb.data = np.exp( exp_xb.data, out=exp_xb.data)
5 1 3826251.0 4e+06 0.6 exp_xb0 = sp.sparse.csr_matrix.sum(exp_xb,axis=1)
6 1 13541.0 13541.0 0.0 exp_xb0 = np.reshape(exp_xb0, (tr, 1))
7 1 69250761.0 7e+07 10.1 pro = exp_xb/exp_xb0
8 1 287.0 287.0 0.0 return pro
This shows that 83% of the program's time is spent doing a matrix multiply between X and tem0. In general, SciPy uses the fastest known algorithms for doing matrix multiplies between arbitrary matrices. You can't go faster unless you can exploit some property of the input matrices.
Helpfully, the right side of this multiply is the transposed Kroneker product of the identity matrix and B, a 1D array.
If B were set to [X, Y, Z]
, this Kronecker product would look like this:
X 0 0
Y 0 0
Z 0 0
0 X 0
0 Y 0
0 Z 0
0 0 X
0 0 Y
0 0 Z
This has some nice properties:
row, col
, then this element adds B[col % p] * X[row, col]
to the element at row, col // p
.With this in mind, I wrote a matrix multiplication function, which computes X @ (I ⊗ B) without explicitly computing (I ⊗ B).
import numpy as np
import scipy as sp
import timeit
import numba as nb
def csr_example(X,B,I,tr):
tem0 = sp.sparse.kron(I, B, format = "csr").transpose()
exp_xb = X@tem0
exp_xb.data = np.exp( exp_xb.data, out=exp_xb.data)
exp_xb0 = sp.sparse.csr_matrix.sum(exp_xb,axis=1)
exp_xb0 = np.reshape(exp_xb0, (tr, 1))
pro = exp_xb/exp_xb0
return pro
def naive_mm(X, B, N):
tem0 = sp.sparse.kron(I, B, format = "csr").transpose()
exp_xb = X @ tem0
return exp_xb
@nb.njit()
def numba_mm_inner(Xindptr, Xindices, Xdata, B, N, out):
p = B.shape[0]
# Loop over rows
for i in range(len(Xindptr) - 1):
# Get all column numbers and data for this row
ind_begin = Xindptr[i]
ind_end = Xindptr[i + 1]
cols = Xindices[ind_begin:ind_end]
data = Xdata[ind_begin:ind_end]
# Where is the beginning of B located with respect to
# the beginning of X?
col_offset = 0
# The column of the output array to write to
out_col = 0
# The running total for this element
sum_ = 0
for j in range(len(cols)):
in_col = cols[j]
while not in_col < col_offset + p:
# None of the remaining X values wil be relevant
# to this column
out[i, out_col] = sum_
sum_ = 0
col_offset += p
out_col += 1
assert in_col >= col_offset
sum_ += data[j] * (B[in_col - col_offset])
# Write any remaining sum
out[i, out_col] = sum_
def numba_mm(X, B, N):
X = X.tocsr()
out = np.zeros((X.shape[0], N))
assert X.shape[0] == out.shape[0]
assert X.shape[1] == N * B.shape[0]
numba_mm_inner(X.indptr, X.indices, X.data, B, N, out)
return out
def csr_opt(X,B,N,tr):
exp_xb = numba_mm(X, B, N)
np.exp(exp_xb, out=exp_xb, where=exp_xb != 0)
exp_xb0 = np.sum(exp_xb,axis=1)
exp_xb0 = np.reshape(exp_xb0, (tr, 1))
pro = exp_xb/exp_xb0
return pro
def main():
T = 30000
N = 200
p = 212
X = sp.sparse.random(T, (N)*p, density=0.018, format='csr')
B = np.random.rand(p,)
I = sp.sparse.eye(N,format="csr")
old = csr_example(X,B,I,T).toarray()
new = csr_opt(X,B,N,T)
print("Methods equal:", np.allclose(old, new))
time_old = timeit.timeit(lambda: csr_example(X,B,I,T),number=20)
time_new = timeit.timeit(lambda: csr_opt(X,B,N,T),number=20)
print(f"old {time_old:.4f}")
print(f"new {time_new:.4f}")
print(f"ratio {time_old / time_new:.4f}")
main()
Benchmark:
Methods equal: True
old 13.1517
new 4.3569
ratio 3.0186
Notes:
naive_mm
and numba_mm
are two implementations of matrix multiply: both compute X @ (I ⊗ B), and are intended to get the same result. naive_mm
is not used by the actual code - it's just for comparison.numba_mm
outputs a dense array as output. I noticed that the output of this step has only 2% sparsity, so sparse matrix isn't really helping you after this point.