rggplot2geom-bar

Why do the bars in the grouped barplot have different heights although having the same percentage (ggplot in R)?


I want to represent percentages for categories of "more_know" from 1 to 7 within each group of "study". In the barplot, the height of category 1 is for both groups 3%. However, height of bars is different. Is this due to rounding?

Here is the data:

    labels.feed2 <- c(1:7)

    df.sci.cin <- data.frame(
      study = factor(c(rep(1,62), rep(2,33)), levels=1:2),
      more_know = factor(c(6, 2, 2, 2, 4, 5, 5, 3, 4, 5, 5, 5, 4, 2, 4, 7, 7, 2, 7, 5, 5, 5, 6, 2, 4, 7, 2, 5, 3, 2, 5, 7, 3, 5, 4, 4, 5, 4, 6, 5, 5, 7, 5, 1, 5, 5, 2, 4, 2, 7, 5, 5, 2, 5, 4, 6, 5, 7, 1, 5, 4, 3, 5, 4, 5, 2, 5, 6, 5, 3, 2, 2, 6, 2, 4, 5, 2, 5, 3, 5, 7, 7, 4, 5, 6, 3, 3, 1, 5, 4, 4, 6, 6, 4, 4), levels=1:7, labels=labels.feed2)
      )
      
    tibble.sci.cin <- as.tibble(df.sci.cin)

This is my code for the barplot:

vec.labels.more_know <- c("1 \nfamiliar with \nall of this", 2:6, "7 \nvery new \nto me")

ggplot(data = tibble.sci.cin, aes(   
  x = factor(more_know, levels = 1:7, labels = vec.labels.more_know),
  fill = factor(study)
)) +
  geom_bar(
    aes(
      y = after_stat(count / ave(count, fill, FUN = sum))
    ),
    position = "dodge"
  ) +
  scale_fill_manual(
    values = c("grey40", "grey60"),
    name = "event location",
    labels = c("university (n=62)", "cinema (n=33)")
  ) +
  geom_text(
    aes(
      y = after_stat(count / ave(count, fill, FUN = sum)),
      label = after_stat(scales::percent(count / ave(count, fill, FUN = sum), accuracy = 1))
    ),
    stat = "count", position = position_dodge(0.9), vjust = -0.5
  ) +
  ylab("percent of audience relative to location") +
  xlab("feeling of more knowledge of climate change after the event") +
  theme(axis.text.x = element_text(hjust = .9)) + #angle = 45, 
  theme(axis.ticks.x = element_blank()) +
  scale_y_continuous(labels = scales::percent, limits = c(0, 0.38)) +
  scale_x_discrete(drop = FALSE) +
  theme(
    panel.border = element_rect(linetype = "solid", colour = "black", linewidth = .5, fill = NA),
    panel.grid.minor = element_line(colour = "grey93", linewidth = .3),
    panel.grid.major.y = element_line(colour = "grey93", linewidth = .3),
    panel.background = element_rect(fill = "grey97")
    ) +
  theme(axis.title.x.bottom = element_text(margin = margin(t = .15, unit = "in")))

This is the barplot I've got: enter image description here

I assume that different height of bars for category 1 are due to exact numbers vs. rounded number of 3%. I've already tried with accuracy=1 and with round( ..., digits=0) but I don't come up with bars of same height. Could you please help me to get a plausible barplot here?


Solution

  • If you compute your values first and use geom_col() instead of geom_bar() you can check the values beforehand.

    tibble.sci.cin %>% 
      mutate(more_know = factor(more_know, levels = 1:7, labels = vec.labels.more_know)) %>% 
      count(study, more_know) %>% 
      group_by(study) %>% 
      mutate(
        sum_study = sum(n),
        y = round(n/sum_study, 2)
      ) %>% 
      ungroup() %>% 
      ggplot() +
      aes(more_know, y= y, fill = study) +
      geom_col(position = position_dodge())
    

    Created on 2024-10-09 with reprex v2.1.1