rggplot2visualizationggridges

Can I use different cutoff points for different groups with stat_density_ridges?


I have a dataframe with different groups ('label' column). For each label, I want to plot a null distribution obtained from bootstrapping (values are in the 'null' column) and the true statistic on top (value in the 'sc' column). Ideally, I would like the area after the statistic to have a different color, to mark that this is my p-value. Is this possible to do with stat_density_ridges?

Here is an example R code:

library(ggplot2)
library(tidyverse)
library(ggridges)

df <- data.frame()

for (label in LETTERS) {
  mean=rnorm(1,0.5,0.2)
  null = rnorm(1000,mean,0.1);
  sc = rnorm(1,0.5,0.2)
  df <- rbind(df, data.frame(label=label, null=null, sc=sc))
}

df <- df %>% 
  mutate(label=as.factor(label))

ggplot(df, aes(x = null, y = label))  +
  stat_density_ridges(scale=1.2,alpha = 1, size=1)+
  scale_x_continuous(limits=c(0,1),breaks=seq(0,1,0.2)) +
  geom_segment(aes(x=sc, xend=sc, y=as.numeric(label)-0.1, yend=as.numeric(label)+0.5), size=1) +
  coord_flip()

The resulting figure is this:

ridge plot

But ideally, I would like each ridge to be more like this:

enter image description here

With the color changes after the sc value. Is that possible? Thanks :)


Solution

  • You could use the fill with ..x.. to create different colors at a fixed x value of your plot. So the shaded area will be the same across all plots. You could modify this by using ggplot_build with a separate dataframe that has the p_values which are the thresholds. So with these thresholds you could conditionally change the color in the layer. Here is some reproducible code:

    library(ggplot2)
    library(tidyverse)
    library(ggridges)
    
    df <- data.frame()
    
    set.seed(7) # for reproducibility
    for (label in LETTERS) {
      mean=rnorm(1,0.5,0.2)
      null = rnorm(1000,mean,0.1);
      sc = rnorm(1,0.5,0.2)
      df <- rbind(df, data.frame(label=label, null=null, sc=sc))
    }
    
    df <- df %>% 
      mutate(label=as.factor(label))
    # Create dataframe with p_values ranges per label
    p_values = df %>% 
      group_by(label) %>% 
      summarise(p_value = unique(sc)) %>%
      mutate(label = as.integer(label)) # make sure label is the same as in ggplot_build
    
    # plot
    p <- ggplot(df, aes(x = null, y = label, fill = ifelse(..x.. < sc, "no sign", "sign"), group = factor(label)))  +
      stat_density_ridges(geom = "density_ridges_gradient",,
                          scale=1.2,alpha = 1, size=1,
                          calc_ecdf = TRUE) +
      scale_fill_manual(values = c("red", "blue"), name = "") +
      coord_flip()
    p
    #> Picking joint bandwidth of 0.0224
    

    # Modify layer
    q <- ggplot_build(p)
    #> Picking joint bandwidth of 0.0224
    q$data[[1]] = q$data[[1]] %>%
      left_join(., p_values,
                by = c("group" = "label")) %>%
      mutate(fill = case_when(x < p_value ~ fill,
                              TRUE ~ "blue")) %>%
      select(-p_value)
    q <- ggplot_gtable(q)
    plot(q)
    

    Created on 2023-03-28 with reprex v2.0.2

    As you can see in the latest plot, the shaded areas are now according to the sc value of your dataframe per group.