rggplot2boxplotp-valuebar-chart

Put stars on ggplot barplots and boxplots - to indicate the level of significance (p-value)


It's common to put stars on barplots or boxplots to show the level of significance (p-value) of one or between two groups, below are several examples:

enter image description hereenter image description hereenter image description here

The number of stars are defined by p-value, for example one can put 3 stars for p-value < 0.001, two stars for p-value < 0.01, and so on (although this changes from one article to the other).

And my questions: How to generate similar charts? The methods that automatically put stars based on significance level are more than welcome.


Solution

  • Please find my attempt below.

    Example plot

    First, I created some dummy data and a barplot which can be modified as we wish.

    windows(4,4)
    
    dat <- data.frame(Group = c("S1", "S1", "S2", "S2"),
                      Sub   = c("A", "B", "A", "B"),
                      Value = c(3,5,7,8))  
    
    ## Define base plot
    p <-
    ggplot(dat, aes(Group, Value)) +
        theme_bw() + theme(panel.grid = element_blank()) +
        coord_cartesian(ylim = c(0, 15)) +
        scale_fill_manual(values = c("grey80", "grey20")) +
        geom_bar(aes(fill = Sub), stat="identity", position="dodge", width=.5)
    

    Adding asterisks above a column is easy, as baptiste already mentioned. Just create a data.frame with the coordinates.

    label.df <- data.frame(Group = c("S1", "S2"),
                           Value = c(6, 9))
    
    p + geom_text(data = label.df, label = "***")
    

    To add the arcs that indicate a subgroup comparison, I computed parametric coordinates of a half circle and added them connected with geom_line. Asterisks need new coordinates, too.

    label.df <- data.frame(Group = c(1,1,1, 2,2,2),
                           Value = c(6.5,6.8,7.1, 9.5,9.8,10.1))
    
    # Define arc coordinates
    r <- 0.15
    t <- seq(0, 180, by = 1) * pi / 180
    x <- r * cos(t)
    y <- r*5 * sin(t)
    
    arc.df <- data.frame(Group = x, Value = y)
    
    p2 <-
    p + geom_text(data = label.df, label = "*") +
        geom_line(data = arc.df, aes(Group+1, Value+5.5), lty = 2) +
        geom_line(data = arc.df, aes(Group+2, Value+8.5), lty = 2)
    

    Lastly, to indicate comparison between groups, I built a larger circle and flattened it at the top.

    r <- .5
    x <- r * cos(t)
    y <- r*4 * sin(t)
    y[20:162] <- y[20] # Flattens the arc
    
    arc.df <- data.frame(Group = x, Value = y)
    
    p2 + geom_line(data = arc.df, aes(Group+1.5, Value+11), lty = 2) +
         geom_text(x = 1.5, y = 12, label = "***")