I am writing a paper and showing a lot of distributions which, of course, means a lot of box and violin plots. However, these can be boring and don't always show the full story of my data. I once saw a plot that used sets of lines stacked over discrete x values, where the y-axis is the value of a certain quantile of a certain quantile. The clustering of these quantiles along the y axis would then give indication as to the distribution. As an example:
The problem I have with this is that I don't like the continuous x-axis. A discrete x-axis would make labeling easier and enable other features I want to add to the plot. I know I could just do all of the labeling in post, but I think it would be cool to find a way within ggplot2. I'm still somewhat new to Stack Overflow, so please let me know if I need to clarify anything.
The code to construct the above graph is below. Note that I don't care about the data format or which geom_* is used, I would just like one of these plots with a discrete x axis.
library(ggplot2)
cutoffs <- seq(0, 1, by = 0.05)
a <- sqrt(seq(1, 10000, length.out = 100))
b <- (seq(1, 10, length.out = 100))^2
c <- seq(1, 100, length.out = 100)
quant_data <- rbind(data.frame('class' = 'a',
'quantile' = quantile(a, probs = cutoffs)),
data.frame('class' = 'b',
'quantile' = quantile(b, probs = cutoffs)),
data.frame('class' = 'c',
'quantile' = quantile(c, probs = cutoffs)))
num_data <- data.frame('class' = c(rep('a', 100), rep('b', 100), rep('c', 100)),
'val' = c(a, b, c))
x_bases <- c(1, 2, 3)
names(x_bases) <- c('a', 'b', 'c')
quant_data$xmin <- x_bases[quant_data$class] - 0.2
quant_data$xmax <- x_bases[quant_data$class] + 0.2
num_data$xnum <- x_bases[num_data$class]
ggplot()+
geom_linerange(data = quant_data, mapping = aes(xmin = xmin, xmax = xmax, y = quantile), linewidth = 2)
It's easier than you might think:
quant_data$xnum <- x_bases[quant_data$class]
ggplot(quant_data, aes(class, quantile)) +
geom_linerange(aes(xmin = xnum - 0.2, xmax = xnum + 0.2))
You do have to be careful that xnum
is correct, in the sense that is the same as the factor levels of class
. So perhaps use this after_stat()
method, to get the correct x-coordinate on the fly:
ggplot(quant_data, aes(class, quantile)) +
geom_linerange(aes(xmin = after_stat(x) - 0.2, xmax = after_stat(x) + 0.2))
This would definitely be my preferred option.