rggplot2group-bydplyr

Adding error bars to ggplot2 bar plot after group by in dplyr


I have the following data in R.

oligo  condition  score
REF    Sample     27.827
REF    Sample     24.622
REF    Sample     31.042
REF    Competitor 21.066
REF    Competitor 18.413
REF    Competitor 36.164
ALT    Sample     75.465
ALT    Sample     57.058
ALT    Sample     66.408
ALT    Competitor 35.420
ALT    Competitor 17.652
ALT    Competitor 21.466

I have munged this and taken the averages of the scores for each condition using the group_by and summarise functions in dplyr.

emsa_test <- emsa_1 %>% 
  group_by(oligo,condition) %>%
  summarise_all(mean)

Creating the this table.

oligo  condition  score
ALT    Competitor 24.84600
ALT    Sample     66.31033
REF    Competitor 25.21433
REF    Sample     27.83033

I then plotted this using ggplot2.

ggplot(emsa_test, aes(oligo, score)) + 
geom_bar(aes(fill = condition), 
         width = 0.4, position = position_dodge(width=0.5), color = "black", stat="identity", size=.3) +  
theme_bw() +
ggtitle("CEBP\u03b1") +
theme(plot.title = element_text(size = 40, face = "bold", hjust = 0.5)) +
scale_fill_manual(values = c("#d8b365", "#f5f5f5"))

My issue is that I need to add error bars to the plot. The implementation would be similar to this.

geom_errorbar(aes(ymin=len-se, ymax=len+se), width=.1, position=pd)

However the after the data is munged, the max and min info contained in table 1 is lost. I could add the error bars manually but I have a few plots to plot so wonder if there is a way to retain this info through the pipeline.

Many Thanks.


Solution

  • library(tidyverse)
    
    df <- read_table(
      "oligo  condition  score
    REF    Sample     27.827
    REF    Sample     24.622
    REF    Sample     31.042
    REF    Competitor 21.066
    REF    Competitor 18.413
    REF    Competitor 36.164
    ALT    Sample     75.465
    ALT    Sample     57.058
    ALT    Sample     66.408
    ALT    Competitor 35.420
    ALT    Competitor 17.652
    ALT    Competitor 21.466"
    )
    
    df %>%
      group_by(oligo, condition) %>%
      summarise(
        mean = mean(score),
        sd = sd(score),
        n = n(),
        se = sd / sqrt(n)
      ) %>%
      ggplot(aes(x = oligo, y = mean, fill = condition)) +
      geom_col(position = position_dodge()) +
      geom_errorbar(
        aes(ymin = mean - se, ymax = mean + se), 
        position = position_dodge2(padding = 0.5)
      ) +
      labs(
        title = "Mean Score ± 1 SE"
      )
    #> `summarise()` has grouped output by 'oligo'. You can override using the
    #> `.groups` argument.
    

    Created on 2024-07-08 with reprex v2.1.0