I've looked through dozens of posts with similar issues but none have provided a solution to my problem so I'm posting here. I have categorical data for which I'm producing descriptive statistics (counts, percentages, and confidence bars) using a combination of tabyl
from the janitor
package and dplyr
.
I want to create a horizontal bar plot of the percentage data with margins of error (error bars using geom_errorbar
), where each percentage bar is a different color (assigned by category), ordered visually from largest to smallest percentages, with the y-axis tick labels muted, and I want the color legend ordered the same as the bars.
In my original data (not the reproducible example below), I have achieved everything in the plot aside from getting the legend to order appropriately. The legend consistently orders itself alphabetically even though I attempt to manually override it. In the example here I've got the added problem of the bars ordering visually from smallest to largest rather than smallest to largest.
So there are two questions here:
(1) How do I set the legend order to agree with the desired bar order when working from a tabyl
?
(2) How do I consistently get bars to go the direction I want in a horizontal geom_col()
?
Example below. (Forgive the verbose ggplot
customizations; I wanted to include everything from my real plot to help figure out if there's some sort of code in there that's screwing things up.)
CODE UPDATED (6/20/2022, 8:51 PM EST):
library(tidyverse)
library(ggplot2)
library(forcats)
library(janitor)
temp <- tribble(
~ Category,
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
"CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
"CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
"CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"EEEE",
"EEEE",
"EEEE",
"EEEE",
)
temp_n <- temp %>%
nrow()
temp_tabyl <-
temp %>%
tabyl(Category) %>%
mutate(Category = factor(Category,levels = c("CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"EEEE"))) %>%
rename(Percent = percent) %>%
arrange(desc(Percent)) %>%
mutate(CI = sqrt(Percent*(1-Percent)/temp_n),
MOE = CI * 1.96,
ub = Percent + MOE,
lb = Percent - MOE)
temp_tabyl %>%
ggplot() +
geom_col(aes(y = Category,
x = Percent,
fill = str_wrap(Category,40)),
colour = "black"
) +
geom_errorbar(
aes(
y = Category,
xmin = lb,
xmax = ub
),
width = 0.4,
colour = "orange",
alpha = 0.9,
size = 1.3
) +
labs(colour="Category") +
geom_label(aes(y = Category,
x = Percent,
label = scales::percent(Percent)),nudge_x = .11) +
scale_x_continuous(labels = scales::percent,limits = c(0,1)) +
labs(title = "Plot Title",
caption = "Plot Caption.") +
theme_bw() +
theme(
text = element_text(family = 'Roboto'),
strip.text.x = element_text(size = 14,
face = 'bold'),
panel.grid.minor = element_blank(),
axis.title.y = element_text(size = 14),
plot.title = element_text(hjust = 0.5, size = 16),
plot.subtitle = element_text(hjust = 1),
plot.caption = element_text(hjust = 0),
axis.text.y=element_blank()
) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
theme(strip.text = element_text(colour = 'white'),
legend.spacing.y = unit(.5, 'cm')) +
guides(fill = guide_legend(as.factor('Category'),
byrow = TRUE)) +
scale_fill_discrete(limits = c("CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"EEEE"))
So it turns out the problem was this bit in the geom_col
portion of the ggplot
code: fill = str_wrap(Category,40)
. Somehow that fill argument didn't play well with scale_fill_discrete
, which is why Jared's initial solution didn't work, but his updated answer gets us most of the way there.
So the solution steps were:
str_wrap
command from the geom_col
fill
argument.scale_fill_discrete(labels = ~ stringr::str_wrap(.x, width = 40))
to the end of the ggplot
code.y = "Category"
to the labs element in the ggplot (to override the yucky y axis title that would otherwise result from the reordering command).Huge thanks to @jared_mamrot for helping me troubleshoot!
Also appropriate citation from another post that offered the solution: How to wrap legend text in ggplot?
library(tidyverse)
library(ggplot2)
library(forcats)
library(janitor)
#>
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#>
#> chisq.test, fisher.test
temp <- tribble(
~ Category,
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
"CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
"CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
"CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"EEEE",
"EEEE",
"EEEE",
"EEEE",
)
temp_n <- temp %>%
nrow()
temp_tabyl <-
temp %>%
tabyl(Category) %>%
mutate(Category = factor(Category,levels = c("DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
"BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
"AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
"CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
"EEEE"))) %>%
rename(Percent = percent) %>%
arrange(desc(Percent)) %>%
mutate(CI = sqrt(Percent*(1-Percent)/temp_n),
MOE = CI * 1.96,
ub = Percent + MOE,
lb = Percent - MOE)
temp_tabyl %>%
ggplot() +
geom_col(aes(y = reorder(Category,Percent),
x = Percent,
fill = Category),
colour = "black"
) +
geom_errorbar(
aes(
y = reorder(Category,Percent),
xmin = lb,
xmax = ub
),
width = 0.4,
colour = "orange",
alpha = 0.9,
size = 1.3
) +
labs(colour="Category",
y = "Category") +
geom_label(aes(y = Category,
x = Percent,
label = scales::percent(Percent)),nudge_x = .11) +
scale_x_continuous(labels = scales::percent,limits = c(0,1)) +
labs(title = "Plot Title",
caption = "Plot Caption.") +
theme_bw() +
theme(
text = element_text(family = 'Roboto'),
strip.text.x = element_text(size = 14,
face = 'bold'),
panel.grid.minor = element_blank(),
axis.title.y = element_text(size = 14),
plot.title = element_text(hjust = 0.5, size = 16),
plot.subtitle = element_text(hjust = 1),
plot.caption = element_text(hjust = 0),
axis.text.y=element_blank()
) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
theme(strip.text = element_text(colour = 'white'),
legend.spacing.y = unit(.5, 'cm')) +
guides(fill = guide_legend(as.factor('Category'),
byrow = TRUE)) +
scale_fill_discrete(labels = ~ stringr::str_wrap(.x, width = 40))
Created on 2022-06-20 by the reprex package (v2.0.1)