rggplot2boxplotgeom

How to align geom_boxplot and geom_point with uneven observations per group?


I am trying to generate a grouped boxplot in ggplot2 with overlaid points for each sample in the dataset. I am trying to use position_dodge2(preserve = "single") to preserve the width of the boxplots when there are missing values in a group, however I cannot get the points to correctly line up with the preserved positions of the boxplots. Can anyone shed any light on how to get the points lined up correctly?
In the example below the points are correctly aligned for the grouped boxplots at p and q but are incorrectly aligned at r where there are no group 3 observations.

smp <- seq(1,21,1)

stat <- c(rep("p",7),
          rep("q",7),
          rep("r",7))

cls <- c(rep("1",3),
         rep("2",2),
         rep("3",2),
         rep("1",3),
         rep("2",2),
         rep("3",2),
         rep("1",3),
         rep("2",4))

div <- runif(21,
             min = 0.1,
             max = 1.5)

df <- data.frame(smp,stat,cls,div)

ggplot(data = df,
       aes(x = stat,
           y = div)) +
  geom_boxplot(aes(fill = cls),
               position = position_dodge2(preserve = "single"),
               alpha = 0.1) +
  geom_point(aes(color = cls,
                 group = cls),
             position = position_dodge(width = 0.75)) +
  theme(legend.position = c(0.85,0.85))

simple boxplot with overlaid points

Altering the geom_point position argument to position_dodge2 does not fix this issue and actually seems to make it worse (unless I'm just using the argument incorrectly).

ggplot(data = df,
       aes(x = stat,
           y = div)) +
  geom_boxplot(aes(fill = cls),
               position = position_dodge2(preserve = "single"),
               alpha = 0.1) +
  geom_point(aes(color = cls,
                 group = cls),
             position = position_dodge2(width = 0.75))

simple boxplot with overlaid points


Solution

  • Good question. I don't know of any way of doing this through position, since points are zero width and therefore can't have preserve = "single".

    One method that works (but seems overly complex) is to calculate the desired position of the points conditionally:

    ggplot(data = df, aes(x = stat, y = div)) +
      geom_boxplot(aes(fill = cls), alpha = 0.1,
                   position = position_dodge2(preserve = "single")) +
      geom_point(aes(x = as.numeric(as.factor(stat)) + 
                       ifelse(ave(cls, stat, FUN = \(x) length(unique(x))) == "2", 
                              (as.numeric(factor(cls)) - 1.5)/4,
                              (as.numeric(factor(cls)) - 2)/4),
                     color = cls, group = cls)) +
      theme_bw(base_size = 20)
    

    enter image description here