I would like to be able to extract the characteristic parameters from kernel density plots produced using Python's Seaborn. While there is a very nice example on obtaining the median of a distribution, I'd like to see whether this can be generalized for multimodal distributions for 1D data and particularly in the 2D case.
Below there is a minimal example from which I manually derive the value of each peak in the 1D case. I hope to find something more systematic and applicable to 2D using the available objects.
import numpy as np
import scipy
import pandas as pd
import seaborn as sns
sns.set(style="white", color_codes=True, font_scale=2)
x1 = np.random.normal(-1.5,1,1000)
y1 = np.random.normal(1.5,1,1000)
x2 = np.random.normal(1.5,1,1000)
y2 = np.random.normal(-1.5,1,1000)
x = np.concatenate((x1,x2))
y = np.concatenate((y1,y2))
d = {'x': pd.Series(x), 'y': pd.Series(y)}
data = pd.DataFrame(d)
px = sns.kdeplot(data.x, shade=True)
x,y = px.get_lines()[0].get_data()
xysel = np.array([(x,y) for x,y in zip(x,y) if x < 0])
imax = np.argmax(xysel[:,1])
x_median = xysel[imax,0]
y_median = xysel[imax,1]
plt.vlines(x_median, 0, y_median, linestyles='dashed', color='b')
px.set_xlim(-5,5)
plt.show()
py = sns.kdeplot(data.y, shade=True, color='r')
x,y = py.get_lines()[0].get_data()
xysel = np.array([(x,y) for x,y in zip(x,y) if x > 0])
imax = np.argmax(xysel[:,1])
x_median = xysel[imax,0]
y_median = xysel[imax,1]
plt.vlines(x_median, 0, y_median, linestyles='dashed', color='r')
py.set_xlim(-5,5)
plt.show()
p = sns.kdeplot(data.x, data.y, shade=True)
You can get the paths by following code:
ax = sns.kdeplot(data.x, data.y, shade=True)
for path in ax.collections[-1].get_paths():
x, y = path.vertices.mean(axis=0)
ax.plot(x, y, "ro")
Here is the output:
ax.collections
is a list of PathCollection
objects that corresponding to every levels in the Axes object.
Every PathCollection
contains a list of Path
object that you can get by get_paths()
method.
The points of the path are save in vertices
array.
If you want to get more information, you need to get the return object of Axes.contourf
, patch the contourf()
method first:
from matplotlib.axes import Axes
def contourf(self, *args, **kw):
self._quadcontourset = self.old_contourf(*args, **kw)
return self._quadcontourset
Axes.old_contourf = Axes.contourf
Axes.contourf = contourf
Then you can get the QuadContourSet
object by ax._quadcontourset
. Please read the source code of QuadContourSet
to understand how to use it.