pythonpython-3.xpymc

Calculating Highest Density Interval (HDI) for 2D array of data points


I have two arrays, x and y, which determine the position and a third one, z, which provides the value of the radiation at such position.

I want to calculate and plot the HDI of 90% over the plot of radiation to be able to say that 90% of the radiation can be found in such region.

If I were to normalize z,the HDI for 2D radiation data means the smallest area in which one can integrate the normalized distribution function and get the desired level, in my case, 0.9 or 90%, being the integral between -inf and inf equal to 1.0.

This problem looks isomorphic to calculate the HDI of a probability distribution so I think that it can be done, but I just do not know how to proceed any further.

Edit to make it clearer: This snipped generates data (x,y, and z) which are similar to the topology of my own, so I am using it as a proxy:

import numpy as np 

## Spatial data, with randomness to show that it is not in a rectangular grid 
x = np.arange(-2,2, step=0.2)
x+= 0.1*np.random.rand(len(x)) 
y = np.arange(-2,2, step=0.2) 
y+= 0.1*np.random.rand(len(y)) 

z = np.zeros((len(x), len(y))) 
for i, ix in enumerate(x): 
    for k, iy in enumerate(y): 
        tmp = np.cos(ix)*np.sin(iy)
        if tmp>=0: 
            z[i,k] = tmp 
z=z/np.sum(z) 

enter image description here

This is a crude sketch of my expectations: The data are mostly accumulated on the left, as shown by the contour plots, and the red line represents HDI of .9, where 90% of the integral of data is enclosed (the line is purposely not following one of the isolines, because that needn't to be the case)

The bold black line labelled 94% HDI is an example of what I want, but I want it in 2D: enter image description here

I cannot find the way to use az.hdi, or az.plot_kde. I have tried to feed the data as observable to a PYMC model, so that the MCMC sampling could translate values to density of points (which is what az.hdi and az.plot_kde want), but I haven't been able to do it. A very crude approach would be to generate artificial points around each of my own proportional to the value of radiation at that point. I think there must be a more elegant and efficient way to solve this.

Getting the indices of the points inside the HDI could be an alternative solution, because calculating the convex hull I could still get the HDI.

Any idea about how to proceed is welcome.


Solution

  • I demonstrate a method using pure scipy/numpy/matplotlib.

    The tricky part about this function is that it's a "density function" but not a "probability density function", so the usual histogram binning etc. does not apply. You need to do a linear tesselation and then a double integral. The resultant density is a function of domain area.

    import matplotlib.pyplot as plt
    import numpy as np
    import scipy.interpolate
    from numpy.random import default_rng
    
    # Spatial data, with randomness to show that it is not in a rectangular grid
    rand = default_rng(seed=0)
    n = 20
    y, x = np.linspace(-2, 2, num=n) + rand.uniform(-0.05, 0.05, size=(2, n))
    radiation = np.clip(
        np.sin(y)[:, np.newaxis] * np.cos(x)[np.newaxis, :],
        a_min=0, a_max=None,
    )
    
    # Since the actual grid is known to be irregular, interpolate to a regular grid
    # This is necessary for a non-biased double integral
    # Irregular edge data need to be discarded for pure interpolation (or else extrapolation is needed)
    bound = min(x.max(), y.max(), -x.min(), -y.min())
    xy_irregular = np.stack(np.meshgrid(x, y), axis=-1).reshape((-1, 2))
    x_regular = np.linspace(-bound, bound, n)
    y_regular = np.linspace(-bound, bound, n)
    xy_regular = np.stack(
        np.meshgrid(x_regular, y_regular), axis=-1,
    ).reshape((-1, 2))
    rad_interpolated = scipy.interpolate.griddata(
        values=radiation.ravel(),
        points=xy_irregular, xi=xy_regular,
        # Linear tesselation: cannot use cubic or else negatives will be introduced
        method='linear',
    )
    
    # Effectively double integral over dxdy
    density = np.sort(rad_interpolated)
    cdf = (density / density.sum()).cumsum()
    
    # The highest-density interval is equivalent to an inverse empirical (non-normal) CDF evaluated at
    # the lower-bound quantile.
    include_quantile = 0.6
    exclude_quantile = 1 - include_quantile
    idensity = cdf.searchsorted(exclude_quantile)
    qdensity = density[idensity]
    
    ax_left: plt.Axes
    ax_right: plt.Axes
    fig, (ax_left, ax_right) = plt.subplots(ncols=2)
    
    ax_left.contour(x, y, radiation, levels=[qdensity])
    ax_left.set_title(f'Radiation bound at hdi({include_quantile})={qdensity:.3f}')
    
    area = np.linspace(0, (2*bound)**2, len(cdf))
    ax_right.plot(area, cdf)
    ax_right.set_title('Radiation, cumulative distribution')
    ax_right.set_xlabel('Function area')
    ax_right.set_ylabel('Density')
    ax_right.axhline(exclude_quantile, c='red')
    ax_right.axvline(area[idensity], c='red')
    
    plt.show()
    

    distribution