How do I make (big) sparse covariance matrices into sparse correlation matrices?
Following the code for statsmodels.stats.moment_helpers.cov2corr()
, if the covariance matrix isn't too big I can (element-wise) divide it by the (dense!) standard deviation outer product, and convert back to sparse:
import numpy as np
from scipy import sparse
A = np.array([
[1.0, 0.2, 0.3, 0.0, 0.0],
[0.2, 2.0, 1.0, 0.0, 0.0],
[0.3, 1.0, 3.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 1.0, 0.5],
[0.0, 0.0, 0.0, 0.5, 4.0]])
cov = sparse.csr_matrix(A)
cor = sparse.csr_matrix(cov / np.outer(np.sqrt(cov.diagonal()), np.sqrt(cov.diagonal())))
cor.toarray()
array([[1. , 0.14142136, 0.17320508, 0. , 0. ],` [0.14142136, 1. , 0.40824829, 0. , 0. ],` [0.17320508, 0.40824829, 1. , 0. , 0. ],` [0. , 0. , 0. , 1. , 0.25 ],` [0. , 0. , 0. , 0.25 , 1. ]])`
However the dense standard deviation outer product is n X n, and for n = 100K and up, this is wasteful if at all possible with even a large RAM.
Bad habit to answer my own question, but the solution is really simple and cool, with no element-wise division. Need to multiply the sparse covariance matrix from left and right by a sparse diagonal matrix with inverse standard deviations on its diagonal:
S = sparse.diags(1/np.sqrt(cov.diagonal()))
cor = S @ cov @ S