I am trying to create a boxplot with different categories and overlay scatter points on top of it. The problem I am encountering is that when the results across categories are very similar, the boxplots appear to overlap (see the figure, specifically panel B, TSS metric).
I have tried adjusting the width of the boxes, but this ends up misaligning the scatter points relative to the boxes, which is not ideal.
Could anyone suggest a better way to prevent the boxplots from overlapping while keeping the scatter points properly aligned?
Thank you so much!
Here is the code I have been using:
map_levels = {
0.5: "High",
0.2: "Medium",
0.1: "Low",
0.01: "Extremely low"
}
simulated_RF["Prevalence_level"] = simulated_RF["Prevalence"].map(map_levels)
order_levels = ["High", "Medium", "Low", "Extremely low"]
simulated_RF["Prevalence_level"] = pd.Categorical(
simulated_RF["Prevalence_level"], categories=order_levels, ordered=True
)
metrics_to_plot = ["AUC", "TSS", "BrierScore", "LogLoss"]
palette_custom = ["cornflowerblue", "orange"]
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()
panel_labels = ["A)", "B)", "C)", "D)"]
for i, metric in enumerate(metrics_to_plot):
ax = axes[i]
df_m = simulated_RF[simulated_RF["Metric"] == metric]
sns.boxplot(
data=df_m,
x="Prevalence_level",
y="Value",
hue="Type",
dodge=True,
ax=ax,
palette = palette_custom
)
sns.stripplot(
data=df_m,
x="Prevalence_level",
y="Value",
hue="Type",
dodge=True,
palette=["black", "black"],
size=4,
jitter=True,
alpha=0.5,
ax=ax
)
ax.set_title(metric, fontsize=16)
ax.set_xlabel("Prevalence level", fontsize=13)
ax.set_ylabel(metric, fontsize=13)
ax.set_xticks(range(len(order_levels)))
ax.set_xticklabels(order_levels, fontsize=11)
ax.tick_params(axis="y", labelsize=11)
ax.text(-0.1, 1.05, panel_labels[i],
transform=ax.transAxes, fontsize=16, fontweight="bold")
handles, labels = ax.get_legend_handles_labels()
if i == 0:
legend_handles = handles[:2]
legend_labels = labels[:2]
ax.get_legend().remove()
fig.legend(
legend_handles, legend_labels,
loc="lower center", ncol=2,
fontsize=13, title_fontsize=13
)
fig.tight_layout(rect=[0, 0.05, 1, 0.95])
plt.show()
sns.boxplot() has a parameter gap= to add a gap between dodged boxes. gap is measured over the x-axis (0.1 means 1/10 of the distance between the x-positions) and defaults to 0. The boxes themselves stay on the same positions, so they still align nicely with the stripplot.
You can also set showfliers=False to suppress the boxplot's outliers, as they now are shown via the stripplot.
import matplotlib.pyplot as plt
import seaborn as sns
# load a test dataframe
df = sns.load_dataset('iris')
fig, ax = plt.subplots(figsize=(14, 5))
# convert to long format
df_long = df.melt(id_vars='species', var_name='measurement', value_name='value',
value_vars=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'])
sns.boxplot(data=df_long, hue='species', x='measurement', y='value', palette='turbo',
showfliers=False, gap=0.1)
sns.stripplot(data=df_long, hue='species', x='measurement', y='value', dodge=True, legend=False,
palette=['pink']*3, edgecolor='blue', linewidth=1)
ax.set(xlabel='', ylabel='')
sns.despine()
plt.tight_layout()
plt.show()