rggplot2statisticsggpubraesthetics

How to get stat significance marks on a multiple boxplot in one graph?


I have generated multiple boxplots on one graph comparing three locations regarding three markers. I want to add statistical significance marks for each marker over all the locations. However, I don't wish to draw significance between markers.

library(ggplot2)
library(dplyr)
library(tidyr)
library(ggpubr)

data <- data.frame(
  fat = rep(c("Lake A", "Lake B", "Lake C"), each = 50),
  zon = c(rnorm(50, mean = 5, sd = 1), rnorm(50, mean = 5.5, sd = 1), rnorm(50, mean = 4.5, sd = 1)),
  cal = c(rnorm(50, mean = 6, sd = 1.5), rnorm(50, mean = 6.5, sd = 1.5), rnorm(50, mean = 5.5, sd = 1.5)),
  si = c(rnorm(50, mean = 7, sd = 1), rnorm(50, mean = 7.5, sd = 1), rnorm(50, mean = 6.5, sd = 1)), other1 = rnorm(150),other2 = rnorm(150))

data_selected <- data %>% select(place = fat, zon, cal, si)

data_long <- pivot_longer(data_selected, cols = c("zon", "cal", "si"), names_to = "category", values_to = "value")

comparisons <- list(unique(data_long$place))

ggplot(data_long, aes(x = place, y = value, fill = category)) +
  geom_boxplot() +
  labs(title = NULL,
       x = "Location",
       y = "Marker level (Log-values)",
       fill = "Marker") +
  theme_bw() +
  theme(legend.position = "top") +                 
  ggpubr::stat_compare_means(comparisons = comparisons, aes(label = ..p.signif..), method = "t.test", size = 5, vjust = .5) 

This example indicates just one comparison, how to make multiple comparisons?


Solution

  • If you read the documents, it says that the comparisons object should be:

    A list of length-2 vectors. The entries in the vector are either the names of 2 values on the x-axis or the 2 integers that correspond to the index of the groups of interest, to be compared.

    Whereas you have a single vector with all three lakes:

    dput(comparisons)
    #> list(c("Lake A", "Lake B", "Lake C"))
    

    So your comparisons need to be something like:

    comparisons <- list(c("Lake A", "Lake B"), 
                        c("Lake A", "Lake C"),
                        c("Lake B", "Lake C"))
    

    Note that stat_compare_means will only compare values between x axis locations, it will not compare different groups at each x axis location. See for example this discussion on GitHub. As I understand it, you want all three site comparisons made for all three groups, for a total of 9 different comparisons.

    The author recommends that in this situation you use facets. This is actually going to give you a much tidier plot than having a stack of 9 brackets to compare:

    ggboxplot(data_long, x = "place", y = "value", facet.by = "category",
              fill = "category") +
      labs(title = NULL,
           x = "Location",
           y = "Marker level (Log-values)",
           fill = "Marker") +
      stat_compare_means(comparisons = comparisons, 
                         aes(label = ..p.signif.., group = category), 
                         method = "t.test", size = 5) +
      theme_bw() +
      theme(legend.position = "top")
    

    enter image description here

    There are more conovulted ways to get the plot you are describing, but I'm not sure that they are worth it for the amount of work you would need to put in.


    Addendum

    To get this to work properly for all groups you would need to set the boxplots at integer values on the x axis:

    cmp <- do.call("c", lapply(0:2, function(i) asplit(combn(c(1, 5, 9), 2) +i, 2)))
    
    data_long %>%
      mutate(xmain = 4 * (as.numeric(factor(place)) - 1),
             xpos = xmain + as.numeric(factor(category))) %>%
      ggplot(aes(xpos, value, fill = category, group = xpos)) +
      geom_boxplot() +
      stat_compare_means(comparisons = cmp, aes(label = ..p.signif..),
                         method = "t.test", size = 5) +
      labs(title = NULL,
           x = "Location",
           y = "Marker level (Log-values)",
           fill = "Marker") +
      scale_x_continuous(breaks = c(2, 6, 10),
                         labels = c("Lake A", "Lake B", "Lake C")) +
      theme_bw() +
      theme(legend.position = "top")
    

    enter image description here