rggplot2boxplot

Adjusting the scale on ggplot


I'm creating some box plots with geom_boxplot and geom_jitter in ggplot2. For the most part, my data points are clustered around the boxes, but there are a few that aren't. I'm not removing them as outliers. When the plot is rendered, it squashes the boxes so that the y axis is scaled evenly and it shows the points at the top. What I'd like to do, is still show the points, but have the y axis distance between 1 and 3 the same as between 0 and 1 (approximately anyway). If the results were larger, I would log or square root transform, but they're small numbers. Is there a way I can make this plot?

Here's some code

     dat <- data.frame (cat = "A", result = rnorm (87, 0.26, 0.19))
  
  ggplot(dat, aes (x = cat, y = result)) +
    geom_boxplot()+
    geom_jitter()

Which produces

example box plot

Now add in some data points further away

 new_values <- data.frame(cat = "A", result = c(3.4 ,3.2))
  dat <- rbind(dat, new_values)
  ggplot(dat, aes (x = cat, y = result)) +
    geom_boxplot()+
    geom_jitter()

which produces

the 'problem'

What I'd like to do is adjust the scale of the y axis so that the box plot isn't compressed but it still shows the other two data points. Something like this.

approximation of desired result

Any suggestions welcome. Thanks in advance


Solution

  • In general you can apply any transformation to a scale via the trans= argument. When you have specific needs and it's worth the effort you can create a custom transformation. However, as first step you might consider using one of the built-in transformations, e.g. scales::transform_modulus (a generalization of a Box-Cox transformation) seems to come close to what you have in mind:

    library(ggplot2)
    library(scales)
    
    set.seed(123)
    
    dat <- data.frame(cat = "A", result = rnorm(87, 0.26, 0.19))
    new_values <- data.frame(cat = "A", result = c(3.4, 3.2))
    dat <- rbind(dat, new_values)
    
    ggplot(dat, aes(x = cat, y = result)) +
      geom_boxplot(outliers = FALSE) +
      geom_jitter() +
      scale_y_continuous(
        trans = scales::transform_modulus(-1),
        breaks = c(0, .5, 1.75, 3.5)
      )