I have
which was plotted from
import pandas as pd
from plotnine import ggplot, aes, after_stat, geom_bar, geom_label
def combine(counts: pd.Series, percentages: pd.Series):
fmt = "{} ({}%)".format
return [
fmt(c, round(p))
for c, p
in zip(counts, percentages, strict=True)
]
d = {
'cat': [*(2200 * ['cat1']), *(180 * ['cat2']), *(490 * ['cat3'])],
'subcat': [
*(2200 * ['subcat1']),
*(150 * ['subcat2']),
*(30 * ['subcat3']),
*(40 * ['subcat4']),
*(450 * ['subcat5'])
]
}
df = pd.DataFrame(d)
cats = (
ggplot(df, aes('cat', fill='subcat'))
+ geom_bar()
+ geom_label(
aes(label=after_stat('combine(count, count / sum(count) * 100)')),
stat='count',
position='stack'
)
)
cats.save('cats.png')
The combine
function was modified from the original in Show counts and percentages for bar plots.
The label for subcat4 is partially covered by the one for subcat5, making its count and percentage unreadable.
How can a label be hidden or, better yet, simply not plotted if its count is too small?
I tried
...
fmt(c, round(p)) if p > 5 else (None, None)
...
but that just makes the labels with percentages lower than or equal to 5% say “(None, None).”
Using position='fill'
for both geom_bar
and geom_label
is not really a solution either because the problem persists for sufficiently small counts (e.g., if the count for subcat4 is 10). And I also want to preserve proportionality of subcategories across all categories, which is lost with position='fill'
.
The end goal, really, is to just not have labels overlap, so other approaches—other than hiding them—are acceptable too. (I thought of “dodging” labels vertically on the y-axis, but I don’t think that’s possible.)
You may modify the combine
function to return an empty string ''
instead of (None, None)
like this:
def combine(counts: pd.Series, percentages: pd.Series):
fmt = "{} ({}%)".format
return [
fmt(c, round(p)) if p > 5 else ''
for c, p
in zip(counts, percentages, strict=True)
]