pythonnumpysimulation

numpy positive semi-definite warning


In a Python script I'm writing I am simulating multivariate normal random vectors with the expression

np.random.multivariate_normal(np.zeros(dim_obs), y_cov)

My script runs, but generates the following warning:

RuntimeWarning: covariance is not positive-semidefinite.

Also the little debug print statements I throw in there print False most of the time

print( np.all(np.linalg.eigvals(y_cov) > 0) )

Why is this throwing false positives? My y_cov is positive semi-definite because it is (sorry about the lack of TeX markup) B x x'B' + y y' where the B is a matrix, and the others are random vectors with each element positive.

In this particular run B is actually just a ones vector of size 9. Can I just ignore this warning? From the documentation:

Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of this method is undefined and backwards compatibility is not guaranteed.

Edit: here's a runnable thing altogether. Thanks for the tip @user2357112.

import numpy as np
num_factors = 1
dim_obs = 9
u = np.random.normal(size = num_factors)
v = np.random.normal(size = dim_obs)
y_cov = np.dot(np.ones((9,1)), np.exp(u.reshape((num_factors,1))/2))
y_cov = np.dot(y_cov, np.exp(u.reshape((1,num_factors))/2)) #transpose
y_cov = np.dot(y_cov, np.transpose(np.ones((9,1))))
y_cov += np.dot(np.exp( v.reshape((dim_obs,1)) / 2), 
                np.exp( v.reshape((1,dim_obs)) / 2))
print( np.random.multivariate_normal(np.zeros(dim_obs), y_cov) )
print( np.all(np.linalg.eigvals(y_cov) > 0) )
print( np.linalg.eigvals(y_cov)  )

Solution

  • Theoretically, your matrix is positive semidefinite, with several eigenvalues being exactly zero. But the computations with floating point numbers introduce truncation errors which result in some of those eigenvalues being very small but negative; hence, the matrix is not positive semidefinite.

    For the time being, it looks like the warning may be ignored; but NumPy documentation says that the behavior in non-psd case is undefined, so I would not want to rely on this. A way to correct for the floating point errors is to add a tiny multiple of the identity matrix to y_cov. For example, like this:

    min_eig = np.min(np.real(np.linalg.eigvals(y_cov)))
    if min_eig < 0:
        y_cov -= 10*min_eig * np.eye(*y_cov.shape)
    

    Adding a fixed multiple of identity, like 1e-12, would work for all reasonable size matrices and still wouldn't matter for the results.


    For completeness, a simpler way to reproduce the issue:

    import numpy as np
    x = np.random.normal(size=(5,))
    y = np.outer(x, x)
    z = np.random.multivariate_normal(np.zeros(5), y)    
    

    This throws the same warning (with high probability).