In Python, I have a dataframe df
with columns x
, class
, ratio
and cnt
.
I obtained this aggregating some data before, so I know that there is a unique row for each (x, class)
pair. The idea is that I want to see the ratio
and cnt
for each x
split by class
.
To display ratio
, I want to use a barplot, and to display cnt
I want to use a lineplot. This should be done on a dual-axis.
Based on the many answers to similar questions that I read, I tried the following:
plt.figure(figsize=(15,8))
ax1 = sns.barplot(x="x", y="ratio", hue="class", data=df)
ax2 = ax1.twinx()
sns.pointplot(x="x", y='cnt', data=df, hue="class", color='red', ax=ax2)
ax2.grid(False)
The problem is that the output that this gives is not really what I need, as this outputs many lines, one for each class
.
What I want is to have a unique lineplot for all values of cnt
. I do not really care about splitting by class
for the lineplot. I was just doing this to ensure that the markers would appear on the correct place, on top of each bar. But this is not the output I get.
EDIT:
As my question was not clear, in the image below I show better what I meant that I needed. The plot was made by @JohanC using dodge
. However, I am looking for a way to construct the black line. I do not really care about splitting the lineplot (or pointplot) by hue too. Worst case, I would accept having many lines, a line per x
value across its hue
values (i.e. the same black curve but deleting the segments linking each set of bars).
By default, sns.pointplot()
uses a small "dodge" distance. Adding dodge=d
might work in your case. Here d
is calculated as 0.8
(the distance over which bars are spread) multiplied by (h-1)/h
where h
is the number of hue categories. This multiplication is needed, because bars have a width, while points are considered not having a width.
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
plt.figure(figsize=(15, 8))
ax1 = sns.barplot(x="day", y="total_bill", hue="sex", errorbar=None, data=tips, palette='spring')
ax2 = ax1.twinx()
h = len(tips['sex'].unique())
sns.pointplot(x="day", y='tip', data=tips, hue="sex", dodge=0.8*(h-1)/h, palette='winter', legend=False, ax=ax2)
plt.show()
If I understand correctly, instead of having lines connecting the same hue value for different x's, you want the lines to connect the different hue values for each x.
You can first use sns.pointplot
to draw the lines connecting the hues of different x's. And then extract the values and positions. Then calculate a new ordering, to draw the desired lines. The old lines need to be removed.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'x': [*'ABC'] * 4,
'class': np.tile([*'WXYZ'], 3),
'ratio': np.random.rand(12),
'cnt': np.random.randint(10, 30, 12)})
plt.figure(figsize=(15, 8))
ax1 = sns.barplot(x="x", y="ratio", hue="class", alpha=0.6, data=df)
ax2 = ax1.twinx()
# draw a regular pointplot aligned with the bars, use errorbar=None because those would be extra lines
num_hues = len(df["class"].unique())
sns.pointplot(x="x", y='cnt', data=df, hue="class", dodge=0.8 * (num_hues - 1) / num_hues,
errorbar=None, legend=False, ax=ax2)
# extract the x and y positions, converting them to 1d arrays
xs = np.array([line.get_xdata() for line in ax2.lines]).ravel()
ys = np.array([line.get_ydata() for line in ax2.lines]).ravel()
# get the left to right order of the x-values
x_order = np.argsort(xs)
# remove the lines of the point plot
for line in ax2.lines[::-1]:
line.remove()
# plot the line connecting the points in left to right order
ax2.plot(xs[x_order], ys[x_order], ls='-', marker='o', color='crimson', label='counts')
# add the line to the legend of ax1
handles1, labels1 = ax1.get_legend_handles_labels()
handles2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(handles1 + handles2, labels1 + labels2)
plt.show()
Note that this kind of plot isn't supported in standard Seaborn, as it looks quite confusing.