rggplot2scatter-plotggdist

{ggdist}: How to prevent stat_dots() from overlapping stat_halfeye() in `position = "dodge"`


I am trying to visualise the distribution of response variable using raincloud plots, where one of the two factors is drawn on the x-axis (am here) and the other (vs here) is differentiated by colour. I used position = "dodge", position = "dodgejust" and position = position_dodge(width = <number>) to align the factor vs, but the 'rain' created by ggdist::stat_dots() overlaps the 'clouds' drawn by ggdist::stat_halfeye(). In the figure below, the green dots overlap green 'clouds'. How should I stop this problematic behaviour?

enter image description here

library(tidyverse)

mtcars |>
  mutate(
    am = am |>
      as.factor(),
    vs = vs |>
      as.factor()
  ) |>
  ggplot(
    aes(
      x = am,
      y = mpg,
      colour = vs,
      fill = vs
    )
  ) +
  ggdist::stat_halfeye(
    # position = "dodge",
    position = position_dodge(width = 0.75),
    point_interval = median_qi,
    width = 0.5,
    .width = c(0.66, 0.95),
    interval_size_range = c(1.25, 2.5),
    interval_colour = "black",
    point_colour = "black",
    fatten_point = 3
  ) +
  ggdist::stat_dots(
    position = "dodge",
    #position = "dodgejust",
    #position = position_dodge(width = 0.5),
    binwidth = 1,
    side = "left",
    dotsize = 1
  ) +
  scale_fill_viridis_d(
    begin = 0.3,
    end = 0.6,
    aesthetics = c("colour", "fill")
  )

Solution

  • There are three parameters you can adjust here that are relevant: position, width (equivalently height when horizontal), and scale. width/height and scale are illustrated in this diagram from the slabinterval vignette:

    diagram of slabinterval properties

    In your case, position and width can be used to adjust how the geometries are dodged and how far apart they are dodged, but I don't recommend using them to prevent overlaps. As a general rule, if you want to use two ggdist geoms together and have them dodge correctly, they should have the exact same values of position and width.

    (as an aside, I just realized you are also setting binwidth manually, which is likely to make this process painful. If you use the parameters below appropriately --- particularly scale, as I will show --- it will automatically pick a binwidth to fit your dotplot into the available space. So I will omit the binwidth parameter in what follows).

    If you start with this plot:

    library(tidyverse)
    library(ggdist)
    
    df = mtcars |>
      mutate(
        am = am |>
          as.factor(),
        vs = vs |>
          as.factor()
      )
    
    df |>
      ggplot(
        aes(
          x = am,
          y = mpg,
          colour = vs,
          fill = vs
        )
      ) +
      ggdist::stat_halfeye(
        position = "dodge",
        point_interval = median_qi,
        .width = c(0.66, 0.95),
        interval_size_range = c(1.25, 2.5),
        interval_colour = "black",
        point_colour = "black",
        fatten_point = 3
      ) +
      ggdist::stat_dots(
        position = "dodge",
        side = "left",
        dotsize = 1
      ) +
      scale_fill_viridis_d(
        begin = 0.3,
        end = 0.6,
        aesthetics = c("colour", "fill")
      )
    

    raincloud plots with overlaps

    You can see the overlaps of dots and slabs. You could adjust width so that the two related subgroups within vs are closer together, but this does not guarantee no overlaps between dots and slabs, even though by chance there aren't any in this example (e.g. if the group where vs == 0 and am == 0 had some more values around 19, that density would overlap with the dots from the vs == 1 and am == 0 group):

    df |>
      ggplot(
        aes(
          x = am,
          y = mpg,
          colour = vs,
          fill = vs
        )
      ) +
      ggdist::stat_halfeye(
        # make sure position and width are the same for both geoms
        position = "dodge",
        width = 0.5,
        
        point_interval = median_qi,
        .width = c(0.66, 0.95),
        interval_size_range = c(1.25, 2.5),
        interval_colour = "black",
        point_colour = "black",
        fatten_point = 3
      ) +
      ggdist::stat_dots(
        # position and width same as the halfeye to keep them in sync
        position = "dodge",
        width = 0.5,
        
        side = "left",
        dotsize = 1
      ) +
      scale_fill_viridis_d(
        begin = 0.3,
        end = 0.6,
        aesthetics = c("colour", "fill")
      )
    

    rainclouds closer together

    If you want to guarantee that the slabs and dots don't overlap, instead adjust the scale parameter. scale does not change the basic position of the geometries, instead it determines how much of the region allocated to the geometry is used to draw the slab (for geom_halfeye) or the dots (for geom_dots). When scale == 1, two adjacent slabs will just touch at their max point. Thus, if you have two geometries (like halfeye and dots) sharing the same space, you can set scale to a value less than 0.5 to guarantee they will not touch:

    df |>
      ggplot(
        aes(
          x = am,
          y = mpg,
          colour = vs,
          fill = vs
        )
      ) +
      ggdist::stat_halfeye(
        position = "dodge",
        scale = 0.5,
        point_interval = median_qi,
        .width = c(0.66, 0.95),
        interval_size_range = c(1.25, 2.5),
        interval_colour = "black",
        point_colour = "black",
        fatten_point = 3
      ) +
      ggdist::stat_dots(
        position = "dodge",
        scale = 0.5,
        side = "left",
        dotsize = 1
      ) +
      scale_fill_viridis_d(
        begin = 0.3,
        end = 0.6,
        aesthetics = c("colour", "fill")
      )
    

    more rainclouds without overlaps

    Note that while width should be the same across the two geometries, scale does not have to be. Often depending on data you can even prevent overlaps with a value greater than 0.5.

    You can see further discussion and an example of rainclouds in the dotsinterval vignette.

    Finally, if you want the dots not to overlap the interval, you could either take @teunbrand's suggestion of using after_scale or supply a justification value greater than 1 to the dots geometry (justification controls the position of the dots relative to the interval):

    df |>
      ggplot(
        aes(
          x = am,
          y = mpg,
          colour = vs,
          fill = vs
        )
      ) +
      ggdist::stat_halfeye(
        position = "dodge",
        scale = 0.5,
        point_interval = median_qi,
        .width = c(0.66, 0.95),
        interval_size_range = c(1.25, 2.5),
        interval_colour = "black",
        point_colour = "black",
        fatten_point = 3
      ) +
      ggdist::stat_dots(
        position = "dodge",
        scale = 0.5,
        side = "left",
        dotsize = 1,
        justification = 1.2
      ) +
      scale_fill_viridis_d(
        begin = 0.3,
        end = 0.6,
        aesthetics = c("colour", "fill")
      )
    

    rainclouds with dots not overlapping the interval