rggplot2geom-col

Calculating percentages within category using geom_col


This has been asked in various forms before, but I'm attempting it in a slightly different way and can't seem to get it exactly right. When I use this code:

d %>% 
  drop_na(attend) %>% 
  count(race, attend) %>% 
  group_by(race) %>%
  mutate(percent = n/sum(n)*100) %>% 
  ggplot(aes(race, percent, fill = race)) +
  geom_col(position = "dodge")

I get this figure:

enter image description here

The 'attend' variable is just 0s and 1s, and I want to display the percent of 1s within each race. I think that those lines that are showing up inside the charts are actually correct, but what's going on with the rest of those columns? I can't quite figure out that last step.


Solution

  • To achieve your desired result filter your data for attend == 1 values after computing the percentages.

    Note: The blacks lines appear because of overplotting, i.e. as you set position = "dodge" the bars for attend=0 and attend=1 are plotted on top of each other.

    Using some random example data:

    library(tidyr)
    library(dplyr)
    library(ggplot2)
    
    set.seed(123)
    
    d <- data.frame(
      race = sample(c("Asian", "White", "Hispanic", "Black", "Other"), 100, replace = TRUE),
      attend = sample(0:1, 100, replace = TRUE)
    )
    
    d %>% 
      drop_na(attend) %>% 
      count(race, attend) %>% 
      group_by(race) %>%
      mutate(percent = n/sum(n)*100) %>% 
      filter(attend == 1) %>%
      ggplot(aes(reorder(race, percent), percent, fill = race)) +
      geom_col()
    

    enter image description here