Scipy Gaussian KDE : Matrix is not positive definite

I am trying to estimate the the density of a data set at certain points, using scipy.

from scipy.stats import gaussian_kde
import numpy as np

I have a dataset A of 3D points (this is just a minimal example. My actual data has many more dimensions and many more samples)

A = np.array([[0.078377  , 0.76737392, 0.45038174],
       [0.65990129, 0.13154658, 0.30770917],
       [0.46068406, 0.22751313, 0.28122463]])

and the points at which I want to estimate the density

B = np.array([[0.40209377, 0.21063273, 0.75885516],
       [0.91709997, 0.79303252, 0.65156937]])

But I can't seem to be able to use the gaussian_kde function, as

result = gaussian_kde(A.T)(B.T)

returns

LinAlgError: Matrix is not positive definite

How do I fix this error? How do I get the density of my sample?

Solution

TL; DR:

You have highly correlated features in your data which leads to a numerical error. There are several possible ways to address this, each with pros and cons. A drop-in replacement class for gaussian_kde is proposed below.

Diagnostic

Your dataset (i.e. the matrix that you feed when creating the gaussian_kde object, not when using it), likely contains highly dependent features. This fact (possibly combined with having low numerical resolution like float32 and "too many" datapoints) causes the covariance matrix to have near-zero eigenvalues, which due to numerical error can get below zero. This in turn will break the code that uses the Cholesky decomposition on the dataset covariance matrix (see explanation for specific details).

Assuming your dataset has shape (dims, N) you can test if this is your problem via np.linalg.eigh(np.cov(dataset))[0] <= 0. If any of the outputs comes out True, let me be the first welcoming you to the club.

Treatments

The idea is to get all eigenvalues above zero.

Increasing numerical resolution to the highest float that is practical can be an easy fix and worth trying, but may not be enough.
Given the fact that this is caused by correlated features, removing datapoints doesn't help much a priori. There is a slim hope that having less numbers to crush will propagate less error, and keep the eigenvalues above zero. It's easy to implement but it discards data points.
A more involved fix is to identify the highly correlated features and merge them or ignore the "superfluous" ones. This is tricky especially if the correlations among dimensions vary from instance to instance.
Probably the cleanest way is to leave the data untouched, and lift the problematic eigenvalues to positive values. This is usually done in 2 ways:
SVD addresses the problem directly at its core: Decompose the covariance matrix and replace the negative eigenvalues with a small positive epsilon. This will put your matrix back to PD domain introducing minimal error.
If the SVD is too expensive to compute, an alternative numerical hack is to add epsilon * np.eye(D) to the covariance matrix. This has the effect of adding epsilon to each one of the eigenvalues. Again, if epsilon is small enough, this doesn't introduce that much of an error.

If you go for the last approach you'll need to tell gaussian_kde to modify its covariance matrix. This is a relatively clean way I found to do that: simply add this class to your codebase and replace gaussian_kde with GaussianKde (tested on my end, seems to work fine).

    class GaussianKde(gaussian_kde):
        """
        Drop-in replacement for gaussian_kde that adds the class attribute EPSILON
        to the covmat eigenvalues, to prevent exceptions due to numerical error.
        """
    
        EPSILON = 1e-10  # adjust this at will
    
        def _compute_covariance(self):
            """Computes the covariance matrix for each Gaussian kernel using
            covariance_factor().
            """
            self.factor = self.covariance_factor()
            # Cache covariance and inverse covariance of the data
            if not hasattr(self, '_data_inv_cov'):
                self._data_covariance = np.atleast_2d(np.cov(self.dataset, rowvar=1,
                                                             bias=False,
                                                             aweights=self.weights))
                # we're going the easy way here
                self._data_covariance += self.EPSILON * np.eye(
                    len(self._data_covariance))
                self._data_inv_cov = np.linalg.inv(self._data_covariance)
    
            self.covariance = self._data_covariance * self.factor**2
            self.inv_cov = self._data_inv_cov / self.factor**2
            L = np.linalg.cholesky(self.covariance * 2 * np.pi)
            self._norm_factor = 2*np.log(np.diag(L)).sum()  # needed for scipy 1.5.2
            self.log_det = 2*np.log(np.diag(L)).sum()  # changed var name on 1.6.2

Explanation

In case your error is similar, but not quite that, or anyone feels curious, here is the process I followed, hopefully it helps.

The exception stack specified that the error was triggered during a Cholesky decomposition. In my case, this was this line inside the _compute_covariance method.
Indeed, after the exception, checking the Eigenvalues for the covariance and inv_cov attributes via np.eigh showed that covariance had a close-to-zero negative eigenvalue, and hence its inverse had a huge one. Since Cholesky expects PD matrices, this triggered an error.
At this point we can heavily suspect that the tiny, negative eigenvalue is a numerical error, since covariance matrices are PSD. Indeed, the error source comes when the covariance matrix is originally computed from the dataset that has been passed to the constructor, here. In my case, the covariance matrix yielded at least 2 almost identical columns, which led to the residual negative eigenvalue due to numerical error.
When will your dataset lead to a quasi-singular covariance matrix? That question is perfectly addressed in this other SE post. The bottom line is: If some variable is an exact linear combination of the other variables, with constant term allowed, the correlation and covariance matrices of the variables will be singular. The dependency observed in such matrix between its columns is actually that same dependency as the dependency between the variables in the data observed after the variables have been centered (their means brought to 0) or standardized (if we mean correlation rather than covariance matrix) (Kudos and +1 to ttnphns for the amazing work).