rggplot2boxplot

Creating (ggplot2-)boxplots based on integer x values with correct spacing on the x axis


I am trying to visualize my benchmarking results where for three values of the discrete hyperparameter p and two values of categorical hyperparameter q, I have 50 runtimes. I am trying to create a boxplot for each value of p on the x-axis and separate them by color based on q:

library(ggplot2)

min_ex <- data.frame(p = factor(rep(c(3,10,20), each = 2 * 50)),
                     q = factor(rep(c("A", "B"), each = 50)),
                     time = rnorm(3 * 2 * 50))

ggplot(min_ex, aes(x = p, y = time, color = q)) +
  geom_boxplot()

[Boxplot generated from minimal reproducible example](https://i.sstatic.net/KPw6aWVG.png)

AFAIK, I need to code p as a factor in order to group the boxplots by p. However, the x-axis is now disctete, and I want it to be continuous, i.e. I want the spacing between the x-values (3,10,20) to be on the actual continuous scale.

Specifying a continuous scale using scale_x_continuous yields the error Discrete values supplied to continuous scale.. This question had a similar issue that was solved by specifying further factor levels. However, this would only work in my case if the x-values were evenly spaced. I could mitigate that by giving all values between 3 and 20 as factor levels, but then each level gets its on axis tick.

How can I specify a discrete x value on a continuous x axis scale?


Solution

  • A pure {ggplot2} option would be to convert your p column to integers and to explicitly set the group aes to group by both p and q:

    library(ggplot2)
    
    set.seed(123)
    
    min_ex <- data.frame(
      p = factor(rep(c(3, 10, 20), each = 2 * 50)),
      q = factor(rep(c("A", "B"), each = 50)),
      time = rnorm(3 * 2 * 50)
    )
    
    ggplot(
      min_ex,
      aes(
        x = as.integer(as.character(p)),
        y = time,
        color = q,
        group = interaction(p, q)
      )
    ) +
      geom_boxplot() +
      scale_x_continuous(breaks = c(3, 10, 20))