I want to use ggstats::stat_prop() to add proportions to a boxplot. However, the boxplot is facetted and I would like to get proportions across facets. I.e. proportions of the same x-value across facets should add up to 1.
The denominator for the proportions can be set using the 'by' aesthetic, but it is still evaluated within each facet separately.
library(ggplot2)
set.seed(123456)
n = 50
df <- data.frame(group = sample(c('A', 'B', 'C'), size = n, replace = TRUE),
value = rnorm(n),
strata = sample(c('X', 'Y'), n, replace = TRUE))
ggplot(df, aes(x = group, y = value, by = group)) +
geom_boxplot(aes(color = group)) +
geom_text(aes(label = sprintf("%s (n=%d)", scales::label_percent()(after_stat(prop)), after_stat(count))),
stat = ggstats::StatProp,
y = 0) +
facet_wrap(facets = vars(strata))
This results in a proportion of 100% for each group (box) within each facet. What I would like to see is proportions that add up to 100% for each group across facets.
I know this can be done by pre-computing the proportions, collecting them in a second data frame and using that for plotting.
I'd prefer a solution that can get the correct proportions on the fly. Ideally using ggstats functionality, but I'm not bound to that package.
For your use case one option would be to stick with stat="count"
and compute the proportions on the fly using e.g. ave()
:
library(ggplot2)
ggplot(df, aes(x = group, y = value)) +
geom_boxplot(aes(color = group)) +
geom_text(
aes(
label = after_stat(
sprintf(
"%s (n=%d)",
scales::label_percent()(ave(count, x, FUN = \(x) x / sum(x))),
count
)
)
),
stat = "count",
y = 0
) +
facet_wrap(facets = vars(strata))
UPDATE In case of facet_grid
where you want to compute the proportions for each row separately can be achieved with the same approach as well but requires to a second factor
to ave()
to indicate the row of the grid. Unfortunately is only a PANEL
column in the data
so we this requires some additional math.
Note: I modified the example data and added another variable for the facet rows.
library(ggplot2)
set.seed(123456)
n <- 100
df <- data.frame(
group = sample(c("A", "B", "C"), size = n, replace = TRUE),
value = rnorm(n),
strata = sample(c("X", "Y"), n, replace = TRUE),
row = sample(c("X", "Y", "Z"), n, replace = TRUE)
)
n_strata <- length(unique(df$strata))
library(ggplot2)
ggplot(df, aes(x = group, y = value)) +
geom_boxplot(aes(color = group)) +
geom_text(
aes(
label = after_stat(
sprintf(
"%s\n(n=%d)",
scales::label_percent()(
ave(count, x, (as.integer(PANEL) - 1) %/% n_strata, FUN = \(x) x / sum(x))
),
count
)
)
),
stat = "count",
y = Inf,
vjust = 1.1
) +
facet_grid(rows = vars(row), cols = vars(strata))