rggplot2geom

Show quantiles with stacked vertical lines over discrete axis in ggplot2


I am writing a paper and showing a lot of distributions which, of course, means a lot of box and violin plots. However, these can be boring and don't always show the full story of my data. I once saw a plot that used sets of lines stacked over discrete x values, where the y-axis is the value of a certain quantile of a certain quantile. The clustering of these quantiles along the y axis would then give indication as to the distribution. As an example:

part of my goal

The problem I have with this is that I don't like the continuous x-axis. A discrete x-axis would make labeling easier and enable other features I want to add to the plot. I know I could just do all of the labeling in post, but I think it would be cool to find a way within ggplot2. I'm still somewhat new to Stack Overflow, so please let me know if I need to clarify anything.

The code to construct the above graph is below. Note that I don't care about the data format or which geom_* is used, I would just like one of these plots with a discrete x axis.

library(ggplot2)

cutoffs <- seq(0, 1, by = 0.05)

a <- sqrt(seq(1, 10000, length.out = 100))
b <- (seq(1, 10, length.out = 100))^2
c <- seq(1, 100, length.out = 100) 

quant_data <- rbind(data.frame('class' = 'a',
                         'quantile' = quantile(a, probs = cutoffs)),
              data.frame('class' = 'b',
                         'quantile' = quantile(b, probs = cutoffs)),
              data.frame('class' = 'c',
                         'quantile' = quantile(c, probs = cutoffs)))

num_data <- data.frame('class' = c(rep('a', 100), rep('b', 100), rep('c', 100)),
                  'val' = c(a, b, c))

x_bases <- c(1, 2, 3)
names(x_bases) <- c('a', 'b', 'c')

quant_data$xmin <- x_bases[quant_data$class] - 0.2
quant_data$xmax <- x_bases[quant_data$class] + 0.2

num_data$xnum <- x_bases[num_data$class]

ggplot()+
  geom_linerange(data = quant_data, mapping = aes(xmin = xmin, xmax = xmax, y = quantile), linewidth = 2)

Solution

  • It's easier than you might think:

    quant_data$xnum <- x_bases[quant_data$class]
    
    ggplot(quant_data, aes(class, quantile)) +
      geom_linerange(aes(xmin = xnum - 0.2, xmax = xnum + 0.2))
    

    You do have to be careful that xnum is correct, in the sense that is the same as the factor levels of class. So perhaps use this after_stat() method, to get the correct x-coordinate on the fly:

    ggplot(quant_data, aes(class, quantile)) +
      geom_linerange(aes(xmin = after_stat(x) - 0.2, xmax = after_stat(x) + 0.2))
    

    This would definitely be my preferred option.

    enter image description here