pythonscipysparse-matrixcovariance-matrix

cov2corr() for scipy sparse matrices


How do I make (big) sparse covariance matrices into sparse correlation matrices?

Following the code for statsmodels.stats.moment_helpers.cov2corr(), if the covariance matrix isn't too big I can (element-wise) divide it by the (dense!) standard deviation outer product, and convert back to sparse:

import numpy as np
from scipy import sparse

A = np.array([
  [1.0, 0.2, 0.3, 0.0, 0.0],
  [0.2, 2.0, 1.0, 0.0, 0.0],
  [0.3, 1.0, 3.0, 0.0, 0.0],
  [0.0, 0.0, 0.0, 1.0, 0.5],
  [0.0, 0.0, 0.0, 0.5, 4.0]])
cov = sparse.csr_matrix(A)
cor = sparse.csr_matrix(cov / np.outer(np.sqrt(cov.diagonal()), np.sqrt(cov.diagonal())))

cor.toarray()
array([[1.        , 0.14142136, 0.17320508, 0.        , 0.        ],`
      [0.14142136, 1.        , 0.40824829, 0.        , 0.        ],`
      [0.17320508, 0.40824829, 1.        , 0.        , 0.        ],`
      [0.        , 0.        , 0.        , 1.        , 0.25      ],`
      [0.        , 0.        , 0.        , 0.25      , 1.        ]])`

However the dense standard deviation outer product is n X n, and for n = 100K and up, this is wasteful if at all possible with even a large RAM.


Solution

  • Bad habit to answer my own question, but the solution is really simple and cool, with no element-wise division. Need to multiply the sparse covariance matrix from left and right by a sparse diagonal matrix with inverse standard deviations on its diagonal:

    S = sparse.diags(1/np.sqrt(cov.diagonal()))
    cor = S @ cov @ S