I have a data set of 36000 rows and 51 columns. Each row is an observation and the first 50 columns are 50 different features of each observation. The 51th columns is one with values 0 or 1, where 0 means that the observation belongs to class A and 1 means it belongs to class B.
Now let's say I want to make a histogram of the values of the first column, call it Feature1. As far as I know, matplotlib's plt.hist() doesn't have the ability to draw 2 histograms in the same plot, one of them corresponding to the features of Feature1 from class A and the other corresponding to the ones from class B. Also, seaborn's sns.distplot doesn't do it as well. So I decided to try seaborn's pairplot as follows
sns.pairplot(df, vars = ["Feature1"], hue= "Class", diag_kind = "hist", diag_kws= dict(alpha=0.55))
Feature1 is the name of the 1st column and Class the name of the last column, which contains the class labels for each observation. The histogram that appears is fine, but I would like to increase the number of bins used. Sadly I didn't find any way to do that using this particular function.
Is anyone aware of a solution to this problem? Thanks
To expound upon the comment by Bugbeeb, when using diag_kind = 'hist'
the diag_kws
are passed into plt.hist()
. This is not outlined in the documentation but is clear from the source,
def PairPlot(...): # ... if diag_kind == "hist": grid.map_diag(plt.hist, **diag_kws) # ...
Since plt.hist()
accepts the argument bins
as an integer to control the number of bins you can simply do
sns.pairplot(df, vars = ["Feature1"], hue = "Class", diag_kind = "hist",
diag_kws = {'alpha':0.55, 'bins':n})
Where n
is the number of bins desired as an int
.