I would like to draw a plot with percentage labels per x-axis group. This works fine without empty groups:
# library
library(ggplot2)
library(reshape2)
# example data from reshape2
str(tips)
#> 'data.frame': 244 obs. of 7 variables:
#> $ total_bill: num 17 10.3 21 23.7 24.6 ...
#> $ tip : num 1.01 1.66 3.5 3.31 3.61 4.71 2 3.12 1.96 3.23 ...
#> $ sex : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 2 2 2 2 2 ...
#> $ smoker : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
#> $ day : Factor w/ 4 levels "Fri","Sat","Sun",..: 3 3 3 3 3 3 3 3 3 3 ...
#> $ time : Factor w/ 2 levels "Dinner","Lunch": 1 1 1 1 1 1 1 1 1 1 ...
#> $ size : int 2 3 3 2 4 4 2 4 2 2 ...
# function to count percentage per day
comp_pct <- function(count, day) {
count / tapply(count, day, sum)[day]
}
# correct plot
ggplot(tips, aes(x = day, group = sex)) +
geom_bar(aes(y = ..prop.., fill = factor(..group..)), stat = "count") +
geom_text(aes(
label = after_stat(scales::percent(comp_pct(count, x))),
y = ..prop..
), stat = "count", position = position_stack(vjust = 0.5)) +
labs(y = "Percent", fill = "sex") +
scale_x_discrete(drop=FALSE) +
scale_y_continuous(labels = scales::percent)
However, after adding an empty level, the labelling with after_stat
does not work anymore as expected. I am not sure if this is caused by the ordering of the output from the tapply()
in comp_pct
. However, I am unable to solve it.
# additional empty level
tips -> tips1
tips1$day <- factor(tips$day, levels=c("NewDay",levels(tips$day)))
levels(tips1$day)
#> [1] "NewDay" "Fri" "Sat" "Sun" "Thur"
# bars OK, labels not OK
ggplot(tips1, aes(x = day, group = sex)) +
geom_bar(aes(y = ..prop.., fill = factor(..group..)), stat = "count") +
geom_text(aes(
label = after_stat(scales::percent(comp_pct(count, x))),
y = ..prop..
), stat = "count", position = position_stack(vjust = 0.5)) +
labs(y = "Percent", fill = "sex") +
scale_x_discrete(drop=FALSE) +
scale_y_continuous(labels = scales::percent)
#> Warning: Removed 2 rows containing missing values (geom_text).
Created on 2022-04-02 by the reprex package (v2.0.1)
The issue is that using count / tapply(count, day, sum)[day]
extracts the computed percentages by position (Sorry. My fault. (;) While this works fine for the original dataset, it does not work in the more general case, i.e. an NA is returned for the fourth bar label:
print(day)
#> [1] 2 3 4 5 2 3 4 5
#> 3 4 5 <NA> 3 4 5 <NA>
print(count / tapply(count, day, sum)[day])
#> 0.1034483 0.3684211 0.2903226 NA 0.1149425 0.7763158 0.9354839 NA
To fix that we have to convert day
to a character to extract the percentage values by name:
library(ggplot2)
library(reshape2)
# additional empty level
tips -> tips1
# function to count percentage per day
comp_pct <- function(count, day) {
count / tapply(count, day, sum)[as.character(day)]
}
tips1$day <- factor(tips$day, levels = c("NewDay", levels(tips$day)))
ggplot(tips1, aes(x = day, group = sex)) +
geom_bar(aes(y = ..prop.., fill = factor(..group..)), stat = "count") +
geom_text(aes(
label = after_stat(scales::percent(comp_pct(count, x))),
y = ..prop..
), stat = "count", position = position_stack(vjust = 0.5)) +
labs(y = "Percent", fill = "sex") +
scale_x_discrete(drop = FALSE) +
scale_y_continuous(labels = scales::percent)