rggplot2geom-pointggboxplot

ggplot scatterplot for 2 categorical variables, 1 categorical variable by color


I like the ability to easily separate data into different series using ggboxplot. The x-axis labels can remain easy to read while a 2nd categorical variable is shown via adjacent colored series.

p <- ggboxplot(df_dummy, x="Trt_Amend", y="Carbon_percent", color="Trt_CC",
               palette=c("red", "blue"),
               main="Great Plot Title",
               xlab="1st Categorical Variable",
               ylab="Continuous Variable") +
  theme(plot.title = element_text(hjust = 0.5)) + # Center plot title.
  grids(linetype="dashed") +
  border("black")
ggpar(p, x.text.angle=45,
      legend.title="2nd Categorical Variable",
      font.main=14,
      ylim=c(0.6, 1.6))

enter image description here

Using boxplots isn't always appropriate though, like when each group has a low number of observations (< 20). Can someone help me figure out how to do this in a ggplot using geom_point?

# How to separate colored series using geom_point?
ggplot(df_dummy, aes(Trt_Amend, Carbon_percent, color=Trt_CC)) +
  geom_point()

enter image description here

Thanks for reading!


Solution

  • The first step would be to dodge your points using position = position_dodge(.75) or to add some jitter using position_jitterdodge() as I do below. The rest of the code is - similar to ggpubr:: ggboxplot - just styling.

    Using some fake random example data:

    set.seed(123)
    
    df_dummy <- data.frame(
      Trt_Amend = paste0("Group", 1:5),
      Trt_CC = rep(factor(0:1), each = 5),
      Carbon_percent = rnorm(80, mean = 1, sd = .1)
    )
    
    library(ggplot2)
    
    ggplot(df_dummy, aes(Trt_Amend, Carbon_percent, color = Trt_CC)) +
      geom_boxplot(width = .6, outlier.shape = NA) +
      geom_point(
        position = position_jitterdodge(jitter.width = .3)
      ) +
      scale_color_manual(values = c("red", "blue")) +
      labs(
        title = "Great Plot Title",
        x = "1st Categorical Variable",
        y = "Continuous Variable",
        color = "2nd Categorical Variable"
      ) +
      ylim(0.6, 1.6) + 
      theme_bw(base_size = 14) +
      theme(
        plot.title = element_text(hjust = 0.5),
        panel.grid = element_line(linetype = "dashed"),
        axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "top"
      )
    

    EDIT You could use two stat_summary layers to add errorbars and the mean:

    ggplot(df_dummy, aes(Trt_Amend, Carbon_percent, color = Trt_CC)) +
      geom_point(
        position = position_jitterdodge(jitter.width = .3)
      ) +
      stat_summary(
        fun.data = "mean_sdl", fun.args = list(mult = 1), position = position_dodge(width = .75),
        geom = "errorbar",
        width = .3
      ) +
      stat_summary(
        fun = "mean", position = position_dodge(width = .75),
        geom = "point", size = 4
      ) + 
      scale_color_manual(values = c("red", "blue")) +
      labs(
        title = "Great Plot Title",
        x = "1st Categorical Variable",
        y = "Continuous Variable",
        color = "2nd Categorical Variable"
      ) +
      ylim(0.6, 1.6) + 
      theme_bw(base_size = 14) +
      theme(
        plot.title = element_text(hjust = 0.5),
        panel.grid = element_line(linetype = "dashed"),
        axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "top"
      )
    

    enter image description here