rggplot2bar-charterrorbartukey

From Boxplot to Barplot in ggplot possible?


I have to do a ggplot barplot with errorbars, Tukey sig. letters for plants grown with different fertilizer concentraitions. The data should be grouped after the dif. concentrations and the sig. letters should be added automaticaly.

I have already a code for the same problem but for Boxplot - which is working nicely. I tried several tutorials with barplots but I always get the problem; stat_count() can only have an x or y aesthetic.

So I thought, is it possible to get my boxplot code to a barplot code? I tried but I couldnt do it :) And if not - how do I automatically add tukeyHSD Test result sig. letters to a ggplot barplot?

This is my Code for the boxplot with the tukey letters:

    value_max = Dünger, group_by(Duenger.g), summarize(max_value = max(Höhe.cm)) 

hsd=HSD.test(aov(Höhe.cm~Duenger.g, data=Dünger), 

trt = "Duenger.g", group = T) sig.letters <- hsd$groups[order(row.names(hsd$groups)), ]



J <- ggplot(Dünger, aes(x = Duenger.g, y = Höhe.cm))+ geom_boxplot(aes(fill= Duenger.g))+ scale_fill_discrete(labels=c("0.5g", '1g', "2g", "3g", "4g"))+ geom_text(data = value_max, aes(x=Duenger.g, y = 0.1 + max_value, label = sig.letters$groups), vjust=0)+ stat_boxplot(geom = 'errorbar', width = 0.1)+ ggtitle("Auswirkung von Dünger auf die Höhe von Pflanzen") + xlab("Dünger in g") + ylab("Höhe in cm"); J

This is how it looks:

boxplot with tukey

Data from dput:

structure(list(Duenger.g = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 
0.5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
4, 4), plant = c(1, 2, 3, 4, 5, 7, 10, 11, 12, 13, 14, 18, 19, 
21, 23, 24, 25, 26, 27, 29, 30, 31, 33, 34, 35, 37, 38, 39, 40, 
41, 42, 43, 44, 48, 49, 50, 53, 54, 55, 56, 57, 58, 61, 62, 64, 
65, 66, 67, 68, 69, 70, 71, 72, 73, 75, 79, 80, 81, 83, 85, 86, 
88, 89, 91, 93, 99, 100, 102, 103, 104, 105, 106, 107, 108, 110, 
111, 112, 113, 114, 115, 116, 117, 118, 120, 122, 123, 125, 126, 
127, 128, 130, 131, 132, 134, 136, 138, 139, 140, 141, 143, 144, 
145, 146, 147, 149), height.cm = c(5.7, 2.8, 5.5, 8, 3.5, 2.5, 
4, 6, 10, 4.5, 7, 8.3, 11, 7, 8, 2.5, 7.4, 3, 14.5, 7, 12, 7.5, 
30.5, 27, 6.5, 19, 10.4, 12.7, 27.3, 11, 11, 10.5, 10.5, 13, 
53, 12.5, 12, 6, 12, 35, 8, 16, 56, 63, 69, 62, 98, 65, 77, 32, 
85, 75, 33.7, 75, 55, 38.8, 39, 46, 35, 59, 44, 31.5, 49, 34, 
52, 37, 43, 38, 28, 14, 28, 19, 20, 23, 17.5, 32, 16, 17, 24.7, 
34, 50, 12, 14, 21, 33, 39.3, 41, 29, 35, 48, 40, 65, 35, 10, 
26, 34, 41, 32, 38, 23.5, 22.2, 20.5, 29, 34, 45)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -105L))

Thank you mirai


Solution

  • A bar chart and a boxplot are two different things. By default geom_boxplot computes the boxplot stats by default (stat="boxplot"). In contrast when you use geom_bar it will by default count the number of observations (stat="count") which are then mapped on y. That's the reason why you get an error. Hence, simply replacing geom_boxplot by geom_bar will not give your your desired result. Instead you could use e.g. stat_summary to create your bar chart with errorbars. Additionally I created a summary dataset to add the labels on the top of the error bars.

    library(ggplot2)
    library(dplyr)
    library(agricolae)
    
    Dünger <- Dünger |>
      rename("Höhe.cm" = height.cm) |>
      mutate(Duenger.g = factor(Duenger.g))
    
    hsd <- HSD.test(aov(Höhe.cm ~ Duenger.g, data = Dünger), trt = "Duenger.g", group = T)
    sig.letters <- hsd$groups %>% mutate(Duenger.g = row.names(.))
    
    duenger_sum <- Dünger |>
      group_by(Duenger.g) |>
      summarize(mean_se(Höhe.cm)) |>
      left_join(sig.letters, by = "Duenger.g")
    
    ggplot(Dünger, aes(x = Duenger.g, y = Höhe.cm, fill = Duenger.g)) +
      stat_summary(geom = "bar", fun = "mean") +
      stat_summary(geom = "errorbar", width = .1) +
      scale_fill_discrete(labels = c("0.5g", "1g", "2g", "3g", "4g")) +
      geom_text(data = duenger_sum, aes(y = ymax, label = groups), vjust = 0, nudge_y = 1) +
      labs(
        title = "Auswirkung von Dünger auf die Höhe von Pflanzen",
        x = "Dünger in g", y = "Höhe in cm"
      )
    #> No summary function supplied, defaulting to `mean_se()`
    

    But as the summary dataset now already contains the mean and the values for the error bars a second option would be to do:

    ggplot(duenger_sum, aes(x = Duenger.g, y = y, fill = Duenger.g)) +
      geom_col() +
      geom_errorbar(aes(ymin = ymin, ymax = ymax), width = .1) +
      scale_fill_discrete(labels = c("0.5g", "1g", "2g", "3g", "4g")) +
      geom_text(aes(y = ymax, label = groups), vjust = 0, nudge_y = 1) +
      labs(
        title = "Auswirkung von Dünger auf die Höhe von Pflanzen",
        x = "Dünger in g", y = "Höhe in cm"
      )