How to create bar chart with gmean, mean, max and min stats for each category. For the data below,
X | B | Y |
---|---|---|
A1 | b1 | 4 |
A1 | b2 | 2 |
A1 | b3 | 3 |
A1 | b4 | 8 |
A2 | b1 | 7 |
A2 | c1 | 10 |
A2 | c2 | 8 |
A2 | b3 | 7 |
A3 | b4 | 10 |
A3 | b5 | 9 |
A3 | b1 | 4 |
A3 | b3 | 1 |
You need to prepare(calculate the aggregates) the data you want to visualise.
import pandas as pd
from plotnine import ggplot, aes, geom_col
from scipy.stats import gmean
from pandas.api.types import CategoricalDtype
# Original Data
df = pd.DataFrame({
"X": sorted(("A1", "A2", "A3") * 4),
"Y": [4, 2, 3, 8, 7, 10, 8, 7, 10, 9, 4, 1]
})
# Calculate the aggregates
df2 = (df.groupby("X")
.agg({"Y": [gmean, "mean", "max", "min"]})
.unstack()
.reset_index()
.rename(columns={0: "value", "level_1": "agg"})
)
# Order the aggregates
df2["agg"] = df2["agg"].astype(CategoricalDtype(["gmean", "mean", "max", "min"]))
(ggplot(df2, aes("X", "value", fill="agg"))
+ geom_col(position="dodge")
)