I want to draw the median segments in geom_boxplot()
with custom colors. I found a solution for R-ggplot2 with gg_build()
which provides x
, xend
, y
, yend
inputs for geom_segment()
to overlay the median segments on the boxplot.
I couldn't find gg_build()
equivalent functionality in plotnine, so my approach is to construct a new dataframe which calculates these 4 values for each group, needed by geom_segment
.
To that end, I know how to get y
and yend
for each group - these values are the group medians. However, not clear on how to calculate x
and xend
? Since I need to find a x-value for each group (for my use-case, groups names are str
type). Additionally, I also require the box-width used in geom_boxplot()
.
Any suggestions on how to extract/calculate those?
Thanks!
Instead of extracting the median values from the dataset created by geom_boxplot
under the hood you can create an aggregated dataframe with the medians which could then be used to draw the median lines using geom_segment
. (And as an R user I would guess that this is the way most R users would approach this problem.) The tricky part is to get the x
and xend
coordinates for the groups. To this end I use pd.factorize
to convert the group column to a sequence of numbers to which I add +/- half of the default box plot width of .75
.
Using a minimal reproducible example based on the mtcars
dataset:
from plotnine import ggplot, geom_boxplot, aes, geom_segment
from plotnine.data import mtcars
import pandas as pd
df_median = mtcars.groupby("cyl")["mpg"].median().reset_index()
df_median['x'] = pd.factorize(df_median['cyl'])[0] + 1
df_median['xend'] = df_median['x'] + .75 / 2
df_median['x'] = df_median['x'] - .75 / 2
(ggplot(mtcars, aes("factor(cyl)", "mpg"))
+ geom_boxplot()
+ geom_segment(
mapping = aes(x = "x", xend = "xend", yend = "mpg", y = "mpg", color = "factor(cyl)"),
data = df_median, size = 1))