I have a pandas dataframe which contains information about distances moved by men and women in different provinces. Apart from an id and the distance, there is also a column for their gender in numerical form (0=men, 1=women, 2=unknown), and a gender label for the legend ('gender_legend' with 'male' and 'female').
I'm trying to plot the relative densities for men and women for each province, and I observed some annoying behaviour: sometimes, the plot for men is drawn in blue and the one for women in orange and sometimes the other way around, with the legends sometimes starting with men and sometimes starting with women (see images). Does anybody have any idea why this is the case, and how to force seaborn to always use the same color for the same gender?
Additionally, if anyone knows how to remove the legend title (here: 'gender_legend'), I'd appreciate this, too. I've already unsuccessfully tried these options.
for province in provinces:
fig, ax = plt.subplots()
sns.kdeplot(data=df[(-(df['gender'] == 2)) & (df['province'] == province)], x='distance', hue='gender_legend', ax=ax)
ax.set(xlabel='Distance (km)', ylabel='density', title=province)
plt.show()
for province in provinces:
fig, ax = plt.subplots()
# to sort dataframe by gender so male is always on top
df = df.sort_values(by=['gender'], ascending=True)
# add legend = False to remove legend
sns.kdeplot(data=df[(-(df['gender'] == 2)) & (df['province'] == province)], x='distance', hue='gender_legend', ax=ax, legend=False)
ax.set(xlabel='Distance (km)', ylabel='density', title=province)
plt.show()
Answer explanation: