pythonstandard-deviationpropagationuncertaintycovariance-matrix

Uncertainties package in Python : use of given covariance matrix to get data uncertainties


I believe that my problem is fairly easy to understand, but I would like to make it very clear, hence the length of this post.

My initial situation, which I sum up below, is similar to the one explained in this post (https://stats.stackexchange.com/questions/50830/can-i-convert-a-covariance-matrix-into-uncertainties-for-variables), but concerns specifically how the python package uncertainties is dealing with such cases.

Here is the situation:

What I would like to do is to perform calculations with my data in such a way that the uncertainties propagate appropriately. Ultimately I wish to display values in the form nominal value +/- uncertainty in the console. The python package 'uncertainties' seems like the right way to proceed, but I am not sure of the meaning of the uncertainty number it provides me for my initial data points.

What I would have expected is that the uncertainty of my initial data points corresponds to the 'naive' standard deviation, ie the square root of the diagonal elements of my covariance matrix. This ignores the correlations in the data, but displaying correlated results in the form nominal value +/- uncertainty can anyway not show the correlations, and this should not be a problem as long as the latter are correctly taken into account for further calculations.

However, the package displays another number to be the uncertainty, and I do not know where it comes from. The package documentation is of very little help. I am wondering if I could possibly be misusing the package.

Can anyone help me understand the situation ? Thanks a lot !!

Here is a minimum reproducible example :

import uncertainties
import numpy as np

# To settle ideas, here are two different covariance matrices with same diagonals 
# -> I expect them to lead to the same std deviations below, but this is not the case:

Cov_matrix1 = np.array([[0.00, 0.0,  0.0], [0.0,  1, 0], [0.0, 0, 4]], np.float64)
Cov_matrix2 = np.array([[0.00, 0.5,  3], [0.5,  1, 0.2], [3, 0.2, 4]], np.float64)

# here are some initial nominal values:

data_nominal = np.array([1, 2, 3], np.float64)

print(" The nominal values of data, whithout covariance matrix is ", data_nominal)

# I impose correlations in my data, using the above covariance matrices

correlated_data1 = np.asarray(uncertainties.correlated_values(data_nominal, Cov_matrix1))
correlated_data2 = np.asarray(uncertainties.correlated_values(data_nominal, Cov_matrix2))

# I print my data in the console, and see that data points have different uncertainties in both cases,
# even though the two covariance matrices have the same diagonals ... What is happening ? 

print("\n First covariance matrix is ")
print(Cov_matrix1)
print("\n Data values are ", correlated_data1)

print("\n 2nd covariance matrix is ")
print(Cov_matrix2)
print("\n Data values are now ", correlated_data2)


Solution

  • I think that the issue is that one of the covariance matrices is "illegal", in the sense that

    Cov_matrix2 = np.array([[0.00, 0.5,  3], [0.5,  1, 0.2], [3, 0.2, 4]], np.float64)
    

    is not positive semi-definite, it has one negative eigenvalue. As such, it cannot be mathematically viable to work with it, which is something the package does not notice. Indeed, the package uses this illegal matrix without warning or error message, and of course, the produced output cannot be considered to be meaningful, hence the unexpected behaviour.