I just encountered a rather serious bug in seaborn 0.12.2 facet plotting, in which a graph is produced -- without any warning or error -- showing the data in the wrong category. Potentially leading the scientist to draw the wrong conclusion!
#!/usr/bin/env python3
# Modules #
import seaborn, pandas
# Create a list for each category #
patients = ['Patient1', 'Patient2', 'Patient3']
cohorts = ['Cohort1', 'Cohort2', 'Cohort3']
treatments = ['Treatment1', 'Treatment2', 'Treatment3']
# We will use these lists to create a DataFrame with unique combinations #
data = {
'Patient': [],
'Cohort': [],
'Treatment': [],
'Value': []
}
# Create all unique combinations and add random values for each #
for patient in patients:
for cohort in cohorts:
for treatment in treatments:
for i in range(10):
data['Patient'].append(patient)
data['Cohort'].append(cohort)
data['Treatment'].append(treatment)
data['Value'].append(np.random.rand())
# Make dataframe #
df = pandas.DataFrame(data)
# Find the indexes of the rows to drop #
index_to_drop = df[(df['Patient'] == 'Patient2') &
(df['Cohort'] == 'Cohort2') &
(df['Treatment'] == 'Treatment2')].index
# Drop these rows from the DataFrame #
df = df.drop(index_to_drop)
###############################################################################
facet_params = dict(data = df,
col = 'Patient',
row = 'Cohort',
col_order = patients,
row_order = cohorts)
seaborn_params = dict(x = 'Treatment',
y = 'Value',)
# Call seaborn #
grid = seaborn.FacetGrid(**facet_params)
# Bar plot #
grid.map_dataframe(seaborn.boxplot, **seaborn_params, showfliers=False)
# Bar plot #
grid.map_dataframe(seaborn.stripplot, **seaborn_params, jitter=True)
# Save #
grid.savefig('facet_bug.png')
In this short example, we have four different levels:
We introduce missing data: there are no values for Patient2 of Cohort2 getting Treatment2.
We make a FacetGrid with the patients and cohorts levels, and plot the treatments on the x axis with the value on the y axis (of each subplot).
We superimpose both a boxplot and a stripplot.
In the case of the stripplot, the data is correctly plotted. In the case of the boxplot, the data pertaining to Treatment3 ends up under the label of Treatment2!
The ideal behavior would actually be to produce a graph where the subaxes for Patient2 of Cohort2 only has two categories on the x axis in order to display only two boxplots (and shouldn't contain an empty space).
Is there any way of producing a facet grid where the number of categories of each X-axis is variable based on the data available?
Here is a mockup of plot desired that I edited manually using GIMP:
seaborn.FacetGrid
:
Warning: When using seaborn functions that infer semantic mappings from a dataset, care must be taken to synchronize those mappings across facets (e.g., by defining the
hue
mapping with a palette dict or setting the data type of the variables tocategory
). In most cases, it will be better to use a figure-level function (e.g.relplot()
orcatplot()
) than to useFacetGrid
directly.
order=['Treatment1', 'Treatment2', 'Treatment3']
should be in both .map_dataframe
calls.python 3.11.3
, pandas 2.0.2
, matplotlib 3.7.1
, seaborn 0.12.2
import pandas as pd
import seaborn as sns
facet_params = dict(data = df,
col = 'Patient',
row = 'Cohort',
col_order = patients,
row_order = cohorts)
seaborn_params = dict(x = 'Treatment',
y = 'Value')
# plot with catplot
g = sns.catplot(kind='box', **facet_params, **seaborn_params, showfliers=False, color='tab:blue')
# map stripplot
g.map_dataframe(sns.stripplot, **seaborn_params, color='k')
xaxis_order = ['Treatment1', 'Treatment2', 'Treatment3']
# Call seaborn
grid = sns.FacetGrid(**facet_params)
# Bar plot
grid.map_dataframe(sns.boxplot, **seaborn_params, showfliers=False, order=xaxis_order)
# Bar plot
grid.map_dataframe(sns.stripplot, **seaborn_params, jitter=True, order=xaxis_order, color='k')