pythonmatplotlibseabornhistogramjointplot

returning array of values in hexbin using seaborn jointplot


I have a dataset that is tracking some position over time and some values that depend upon position, so I would like to use the seaborn plot to show this data. The plot looks like this:

histogram of orientation and distance

And here is the code to make it. I can't share the dataset to make it, but this is to give you an idea of what I'm doing.

h = sns.jointplot(data=None,x=dimerDistance,y=Orientation,
              kind='hex',cmap="gnuplot",ratio=4,
              marginal_ticks=False,marginal_kws=dict(bins=25, fill=False))


plt.suptitle('Orientation Factor - Distance Histogram of Dimer')
plt.tight_layout()
plt.xlabel('Distance [Angstrom]')
plt.ylabel('k')

I would like to pick a bin that is generated by the hexbin function and extract the values that occupy that bin. For example, at around x=25 and y=1.7 is the bin with the highest count according to the colormap. I want to go to that bin with highest count, find the x values and the array index of x that are in this bin, and find the k values based on their shared index. Or you might say, I imagine that there would be something that would look like

bin[z]=[x[index1],x[index2]....x[indexn]]

where z is the index of the bin with the highest count so that I can make a new bin

newbin=[y[index1],y[index[2]...,y[indexn]]

As this data is time related, these indices would tell me the timeframes in which the system falls into the bin, so this would be very nice to know. I have done some snooping around on Stack. I found this post that seemed helpful. Getting information for bins in matplotlib histogram function

is there a way I can access the information I want like in this post?


Solution

  • Seaborn doesn't return this type of data. But the hexplot works similar to plt.hexbin. Both create a PolyCollection from which you can extract the values and the centers.

    Here is an example of how the data can be extracted (and displayed):

    import matplotlib.pyplot as plt
    import seaborn as sns
    
    penguins = sns.load_dataset('penguins')
    g = sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm",
                      kind='hex', cmap="gnuplot", ratio=4,
                      marginal_ticks=False, marginal_kws=dict(bins=25, fill=False))
    
    values = g.ax_joint.collections[0].get_array()
    ind_max = values.argmax()
    xy_max = g.ax_joint.collections[0].get_offsets()[ind_max]
    g.ax_joint.text(xy_max[0], xy_max[1], f" Max: {values[ind_max]:.0f}\n x={xy_max[0]:.2f}\n y={xy_max[1]:.2f}",
                    color='lime', ha='left', va='bottom', fontsize=14, fontweight='bold')
    g.ax_joint.axvline(xy_max[0], color='red')
    g.ax_joint.axhline(xy_max[1], color='red')
    plt.tight_layout()
    plt.show()
    
    print(f"The highest bin contains {values[ind_max]:.0f} values")
    print(f"  and has as center: x={xy_max[0]:.2f}, y={xy_max[1]:.2f}")
    

    The highest bin contains 18 values
    and has as center: x=45.85, y=14.78

    extracting bin info from sns.jointplot with hexplot