rggplot2

How can I display the group mean value and total mean value together in a bar chart using ggplot?


Assume I have a dataframe:

example <- data.frame (
code = c("850304", "850302", "850305", "404013", "404001", "404016"), 
class = c("8503", "8503", "8503", "4040", "4040", "4040"), 
partizipation_t1 = c(3.5, 2.0, 1.6, 3.0, 1.2, 3.9), 
partizipation_t2 = c(3.7, 2.3, 2.0, 3.2, 1.5, 3.7)
)

In this context, partizipation_t1 stands for the first and partizipation_t2 for the second measurement time point.

The aim is to create a bar chart that contains the measurement times on the x-axis and the mean value of participation on the y-axis. In addition, there should be two columns per measurement point: Once the class mean and once the overall mean.

To find out class by class how a class performed between the two measurement times, I first created a partial data set of the class, converted it from wide to long format, summarized the mean and created the plot:

partizipation_class <- subset(example, Schule_Klasse == "8503") %>%
  select(class, partizipation_t1, partizipation_t2)

partizipation_long <- partizipation_class %>%
  rename ("T1" = partizipation_t1,
          "T2" = partizipation_t2) %>%
  pivot_longer(cols = c("T1", "T2"), names_to = "Time", values_to = "Partizipation")

means <- partizipation_long %>%
  group_by(Time) %>%
  summarise(Mean = mean(Partizipation, na.rm = TRUE))

ggplot(means, aes(x = Time, y = Mean, fill = Time)) +
  geom_bar(stat = "identity", width = 0.3, show.legend = FALSE) +
  scale_fill_manual(values = c("steelblue", "orange")) +
  labs (y = "Mean", 
        x = "Partizipation") +
  ggtitle ("Title") +
  theme_classic() +
  coord_cartesian(
    ylim = c(1, 4))

But then I only have a bar chart that compares the values of a class between the points in time. So I am missing the column next to the class mean value, which shows the overall mean value. So, I think I need a grouped bar plot somehow...


Solution

  • You need to summarize the data to include a mean for each class at each time. You also need an overall summary of the mean at each time point, which you can then bind to the summary data frame. Finally, plot them all as a dodged barplot:

    library(tidyverse)
    
    example <- data.frame (code = c("850304", "850302", "850305", "404013",
                                    "404001", "404016"), 
                           class = c("8503", "8503", "8503", "4040", 
                                     "4040", "4040"), 
                           partizipation_t1 = c(3.5, 2.0, 1.6, 3.0, 1.2, 3.9), 
                           partizipation_t2 = c(3.7, 2.3, 2.0, 3.2, 1.5, 3.7))
    
    example_long <- example %>%
      rename_with(~sub("partizipation_", "", .x)) %>%
      select(-code) %>%
      pivot_longer(-class, names_to = "Partizipation")
    
    example_long %>%
      bind_rows(example_long %>% mutate(class = 'all')) %>%
      group_by(class, Partizipation) %>%
      summarise(Mean = mean(value)) %>%
      ggplot(aes(x = Partizipation, y = Mean, fill = class)) +
      geom_col(width = 0.3, position = position_dodge(0.5)) +
      scale_fill_manual(values = c("steelblue", "orange", 'gray70')) +
      theme_classic()
    

    enter image description here