This has been asked in various forms before, but I'm attempting it in a slightly different way and can't seem to get it exactly right. When I use this code:
d %>%
drop_na(attend) %>%
count(race, attend) %>%
group_by(race) %>%
mutate(percent = n/sum(n)*100) %>%
ggplot(aes(race, percent, fill = race)) +
geom_col(position = "dodge")
I get this figure:
The 'attend' variable is just 0s and 1s, and I want to display the percent of 1s within each race. I think that those lines that are showing up inside the charts are actually correct, but what's going on with the rest of those columns? I can't quite figure out that last step.
To achieve your desired result filter your data for attend == 1
values after computing the percentages.
Note: The blacks lines appear because of overplotting, i.e. as you set position = "dodge"
the bars for attend=0
and attend=1
are plotted on top of each other.
Using some random example data:
library(tidyr)
library(dplyr)
library(ggplot2)
set.seed(123)
d <- data.frame(
race = sample(c("Asian", "White", "Hispanic", "Black", "Other"), 100, replace = TRUE),
attend = sample(0:1, 100, replace = TRUE)
)
d %>%
drop_na(attend) %>%
count(race, attend) %>%
group_by(race) %>%
mutate(percent = n/sum(n)*100) %>%
filter(attend == 1) %>%
ggplot(aes(reorder(race, percent), percent, fill = race)) +
geom_col()