rggplot2dplyrjanitor

Ordering ggplot2 legend to agree with factor order of bars in geom_col when plotting data from tabyl


I've looked through dozens of posts with similar issues but none have provided a solution to my problem so I'm posting here. I have categorical data for which I'm producing descriptive statistics (counts, percentages, and confidence bars) using a combination of tabyl from the janitor package and dplyr.

I want to create a horizontal bar plot of the percentage data with margins of error (error bars using geom_errorbar), where each percentage bar is a different color (assigned by category), ordered visually from largest to smallest percentages, with the y-axis tick labels muted, and I want the color legend ordered the same as the bars.

In my original data (not the reproducible example below), I have achieved everything in the plot aside from getting the legend to order appropriately. The legend consistently orders itself alphabetically even though I attempt to manually override it. In the example here I've got the added problem of the bars ordering visually from smallest to largest rather than smallest to largest.

So there are two questions here: (1) How do I set the legend order to agree with the desired bar order when working from a tabyl? (2) How do I consistently get bars to go the direction I want in a horizontal geom_col()?

Example below. (Forgive the verbose ggplot customizations; I wanted to include everything from my real plot to help figure out if there's some sort of code in there that's screwing things up.)

CODE UPDATED (6/20/2022, 8:51 PM EST):

library(tidyverse)
library(ggplot2)
library(forcats)
library(janitor)

temp <- tribble(
  ~ Category,
  "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
  "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
  "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
  "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
  "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
  "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
  "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
  "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
  "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
  "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
  "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
  "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
  "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
  "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
  "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
  "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
  "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
  "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
  "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
  "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
  "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
  "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
  "CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
  "CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
  "CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
  "CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
  "EEEE",
  "EEEE",
  "EEEE",
  "EEEE",
)

temp_n <- temp %>%
  nrow()

temp_tabyl <- 
  temp %>% 
  tabyl(Category) %>% 
  mutate(Category = factor(Category,levels = c("CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
                                               "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA", 
                                               "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB", 
                                               "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
                                               "EEEE"))) %>% 
  rename(Percent = percent) %>% 
  arrange(desc(Percent)) %>% 
  mutate(CI = sqrt(Percent*(1-Percent)/temp_n),
         MOE = CI * 1.96,
         ub = Percent + MOE,
         lb = Percent - MOE)

temp_tabyl %>% 
  ggplot() + 
  geom_col(aes(y = Category, 
               x = Percent,
               fill = str_wrap(Category,40)),
           colour = "black"
  ) + 
  geom_errorbar(
    aes(
      y = Category,
      xmin = lb,
      xmax = ub
    ),
    width = 0.4,
    colour = "orange",
    alpha = 0.9,
    size = 1.3
  ) + 
  labs(colour="Category") +
  geom_label(aes(y = Category, 
                 x = Percent,
                 label = scales::percent(Percent)),nudge_x = .11) + 
  scale_x_continuous(labels = scales::percent,limits = c(0,1)) + 
  labs(title = "Plot Title",
       caption = "Plot Caption.") +
  theme_bw() +
  theme(
    text = element_text(family = 'Roboto'),
    strip.text.x = element_text(size = 14,
                                face = 'bold'),
    panel.grid.minor = element_blank(),
    axis.title.y = element_text(size = 14),
    plot.title = element_text(hjust = 0.5, size = 16),
    plot.subtitle = element_text(hjust = 1),
    plot.caption = element_text(hjust = 0),
    axis.text.y=element_blank()
  ) +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank()) +
  theme(strip.text = element_text(colour = 'white'),
        legend.spacing.y = unit(.5, 'cm')) + 
  guides(fill = guide_legend(as.factor('Category'),
                             byrow = TRUE)) +
  scale_fill_discrete(limits = c("CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
                                 "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA", 
                                 "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB", 
                                 "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
                                 "EEEE"))
  

  

Solution

  • So it turns out the problem was this bit in the geom_col portion of the ggplot code: fill = str_wrap(Category,40). Somehow that fill argument didn't play well with scale_fill_discrete, which is why Jared's initial solution didn't work, but his updated answer gets us most of the way there.

    So the solution steps were:

    1. Remove the str_wrap command from the geom_col fill argument.
    2. Add scale_fill_discrete(labels = ~ stringr::str_wrap(.x, width = 40)) to the end of the ggplot code.
    3. Add y = "Category" to the labs element in the ggplot (to override the yucky y axis title that would otherwise result from the reordering command).

    Huge thanks to @jared_mamrot for helping me troubleshoot!

    Also appropriate citation from another post that offered the solution: How to wrap legend text in ggplot?

    library(tidyverse)
    library(ggplot2)
    library(forcats)
    library(janitor)
    #> 
    #> Attaching package: 'janitor'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     chisq.test, fisher.test
    
    temp <- tribble(
      ~ Category,
      "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
      "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
      "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
      "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
      "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
      "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
      "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
      "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
      "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
      "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA",
      "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
      "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
      "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
      "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
      "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
      "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
      "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
      "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
      "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
      "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
      "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
      "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB",
      "CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
      "CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
      "CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
      "CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
      "EEEE",
      "EEEE",
      "EEEE",
      "EEEE",
    )
    
    temp_n <- temp %>%
      nrow()
    
    temp_tabyl <- 
      temp %>% 
      tabyl(Category) %>% 
      mutate(Category = factor(Category,levels = c("DDDDD DD D DDD DDDD DDD DDDDDDD DDD DDDD DDDDDDD DDD DDD DDDD DDDDDDDDD DDDD DDDDD DDDDDDD",
                                                   "BBBB BBBB BBBBB BBBBBBBBB BBBBBB BBBBB BBBBBB BBBBB BBBBB BBBBBBBBB BBBB BBBBB BBBBBBB", 
                                                   "AAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAA AAAAAAAAAA", 
                                                   "CCCCC CCC CCC CC CCCCC CCC CCCCCCCCCC CCCC CCCCC CCCCCCCCC CCCCCCCCCCC CCCC CCC CCC C CCC",
                                                   "EEEE"))) %>% 
      rename(Percent = percent) %>% 
      arrange(desc(Percent)) %>% 
      mutate(CI = sqrt(Percent*(1-Percent)/temp_n),
             MOE = CI * 1.96,
             ub = Percent + MOE,
             lb = Percent - MOE)
    
    temp_tabyl %>% 
      ggplot() + 
      geom_col(aes(y = reorder(Category,Percent),
                   x = Percent,
                   fill = Category),
               colour = "black"
      ) + 
      geom_errorbar(
        aes(
          y = reorder(Category,Percent),
          xmin = lb,
          xmax = ub
        ),
        width = 0.4,
        colour = "orange",
        alpha = 0.9,
        size = 1.3
      ) + 
      labs(colour="Category",
           y = "Category") +
      geom_label(aes(y = Category, 
                     x = Percent,
                     label = scales::percent(Percent)),nudge_x = .11) + 
      scale_x_continuous(labels = scales::percent,limits = c(0,1)) + 
      labs(title = "Plot Title",
           caption = "Plot Caption.") +
      theme_bw() +
      theme(
        text = element_text(family = 'Roboto'),
        strip.text.x = element_text(size = 14,
                                    face = 'bold'),
        panel.grid.minor = element_blank(),
        axis.title.y = element_text(size = 14),
        plot.title = element_text(hjust = 0.5, size = 16),
        plot.subtitle = element_text(hjust = 1),
        plot.caption = element_text(hjust = 0),
        axis.text.y=element_blank()
      ) +
      theme(panel.grid.major = element_blank(),
            panel.grid.minor = element_blank()) +
      theme(strip.text = element_text(colour = 'white'),
            legend.spacing.y = unit(.5, 'cm')) + 
      guides(fill = guide_legend(as.factor('Category'),
                                 byrow = TRUE)) +
      scale_fill_discrete(labels = ~ stringr::str_wrap(.x, width = 40))
    

    Created on 2022-06-20 by the reprex package (v2.0.1)