pythonpandasstatisticsseabornkdeplot

Extracting characteristic parameters from kdeplots


I would like to be able to extract the characteristic parameters from kernel density plots produced using Python's Seaborn. While there is a very nice example on obtaining the median of a distribution, I'd like to see whether this can be generalized for multimodal distributions for 1D data and particularly in the 2D case.

Below there is a minimal example from which I manually derive the value of each peak in the 1D case. I hope to find something more systematic and applicable to 2D using the available objects.

import numpy as np
import scipy
import pandas as pd
import seaborn as sns
sns.set(style="white", color_codes=True, font_scale=2)

x1 = np.random.normal(-1.5,1,1000)
y1 = np.random.normal(1.5,1,1000)
x2 = np.random.normal(1.5,1,1000)
y2 = np.random.normal(-1.5,1,1000)
x = np.concatenate((x1,x2))
y = np.concatenate((y1,y2))
d = {'x': pd.Series(x), 'y': pd.Series(y)}
data = pd.DataFrame(d)

px = sns.kdeplot(data.x, shade=True)
x,y = px.get_lines()[0].get_data()
xysel = np.array([(x,y) for x,y in zip(x,y) if x < 0])
imax = np.argmax(xysel[:,1])
x_median = xysel[imax,0]
y_median = xysel[imax,1]
plt.vlines(x_median, 0, y_median, linestyles='dashed', color='b')
px.set_xlim(-5,5)
plt.show()

py = sns.kdeplot(data.y, shade=True, color='r')
x,y = py.get_lines()[0].get_data()
xysel = np.array([(x,y) for x,y in zip(x,y) if x > 0])
imax = np.argmax(xysel[:,1])
x_median = xysel[imax,0]
y_median = xysel[imax,1]
plt.vlines(x_median, 0, y_median, linestyles='dashed', color='r')
py.set_xlim(-5,5)
plt.show()

p = sns.kdeplot(data.x, data.y, shade=True)

Solution

  • You can get the paths by following code:

    ax = sns.kdeplot(data.x, data.y, shade=True)
    
    for path in ax.collections[-1].get_paths():
        x, y = path.vertices.mean(axis=0)
        ax.plot(x, y, "ro")
    

    Here is the output:

    enter image description here

    ax.collections is a list of PathCollection objects that corresponding to every levels in the Axes object.

    Every PathCollection contains a list of Path object that you can get by get_paths() method.

    The points of the path are save in vertices array.

    If you want to get more information, you need to get the return object of Axes.contourf, patch the contourf() method first:

    from matplotlib.axes import Axes
    
    def contourf(self, *args, **kw):
        self._quadcontourset = self.old_contourf(*args, **kw)
        return self._quadcontourset
    
    Axes.old_contourf = Axes.contourf
    Axes.contourf = contourf
    

    Then you can get the QuadContourSet object by ax._quadcontourset. Please read the source code of QuadContourSet to understand how to use it.