rggplot2annotationsgridlines

Align annotation_custom values with ggplot gridlines


Not sure if this can be done any other way other than "trial and error", but I'm trying to align the y-axis labels, horizontal grid lines and the corresponding custom annotation values on the right of the plot (sample sizes) so one can read straight across the plot. Achieving this is easier with less groups to plot, but getting the distance between each sample size correct, as well as the font size, and for everything to align well is challenging and takes hours. I'm just wondering if there is a faster/easier way to do this.

My current method:

Run this by as many divisions (the "by" value) as there are groups to plot on the y-axis Example: N=25, 1/25 = 0.04 -> These are the distances between each sample size value

format(round(seq(-0.996:0, by = 0.04),3), scientific = F)

Copy the result..

[1] "-0.996" "-0.956" "-0.916" "-0.876" "-0.836" "-0.796" "-0.756" "-0.716" "-0.676" "-0.636"
[11] "-0.596" "-0.556" "-0.516" "-0.476" "-0.436" "-0.396" "-0.356" "-0.316" "-0.276" "-0.236"
[21] "-0.196" "-0.156" "-0.116" "-0.076" "-0.036"

and paste it here:

# ...annotation_custom(grid::textGrob(pivot_df$n, x = 1.035, y = c(0.996, 0.956, 0.916, 0.876, 
# 0.836, 0.796, 0.756, 0.716, 0.676, 0.636, 0.596, 0.556, 0.516, 0.476, 0.436, 0.396, 0.356, 
# 0.316, 0.276, 0.236, 0.196, 0.156, 0.116, 0.076, 0.036),...

In this plot. Then "eyeball" the result, but re-do everything if it doesn't align well...

ggplot(data=subset(df, !is.na(sal)), 
       aes(y = reorder(species, -sal, FUN = median), x = sal)) + 
  geom_boxplot(outlier.shape = 1, outlier.size = 1, orientation = "y") + 
  coord_cartesian(clip = "off") + 
  annotation_custom(grid::textGrob(pivot_df$n, 
                                   x = 1.035,
                                   y = c(0.996, 0.956, 0.916, 0.876, 0.836, 0.796, 0.756, 0.716, 0.676,
                                         0.636, 0.596, 0.556, 0.516, 0.476, 0.436, 0.396, 0.356, 0.316,
                                         0.276, 0.236, 0.196, 0.156, 0.116, 0.076, 0.036),
                                   gp = grid::gpar(cex = 0.3))) +
  annotation_custom(grid::textGrob(expression(bold(underline("N"))),
                                   x = 1.035, 
                                   y = 1.02,
                                   gp = grid::gpar(cex = 0.5))) + 
ylab("") + 
xlab("") + 
  theme(axis.text.y   = element_text(size=7, face="italic"),
        axis.text.x   = element_text(size=7),
        axis.title.x  = element_text(size=9,face="bold"),
        axis.line = element_line(colour = "black"),
        panel.background = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_rect(colour = "black", fill=NA, size=1), 
        panel.grid.major = element_line(colour = "#E0E0E0"),
        plot.title = element_text(hjust = 0.5)) + 
  theme(plot.margin = margin(21, 40, 20, 20))

enter image description here

This is what it's supposed to look like when it works well, but getting here is really tedious and takes forever if there are 90+ groups to plot on the y-axis. Is there a better way? enter image description here

Example data:

library(dplyr)
library(ggplot2)

df <- data.frame(species = LETTERS[seq(from = 1, to = 25)],
                 sal = rnorm(n=5000, mean = 27, sd = 8),
                 num = sample(x = 1:10, size  = 5000, replace = TRUE))

pivot_df <- df %>% 
  group_by(species) %>% 
  summarize(n = n(),median_sal = median(sal, na.rm = T)) %>%
  arrange(median_sal)

Solution

  • One option to achieve your desired result with less fiddling to get the positions for your annotation_customs right would be to use a continuous y scale which allows for a duplicated axis and which could be used to add your labels.

    To this end you have to first reorder your species. Afterwards convert to a numeric and map the numeric on the y aesthetic. Then use the breaks and labels argument of ´scale_y_continuousto add the labels for the primary and the secondary axis. Also note the use ofaxis.ticks.length` to shift the secondary axis labels to the right.

    library(tidyverse)
    
    df$species <- reorder(df$species, -df$sal, FUN = median)
    df$species_num <- as.numeric(df$species)
    breaks_x_left <- sort(unique(df$species_num))
    labels_x_left <- levels(df$species)
    
    pivot_df <- df %>%
      group_by(species_num) %>%
      summarize(n = n(), median_sal = median(sal, na.rm = T)) %>%
      arrange(median_sal)
    
    labels_x_right <- pivot_df |> select(species_num, n) |> tibble::deframe()
    
    ggplot(
      data = subset(df, !is.na(sal)),
      aes(y = species_num, x = sal, group = species)
    ) +
      geom_boxplot(outlier.shape = 1, outlier.size = 1, orientation = "y") +
      scale_y_continuous(
        breaks = sort(unique(df$species_num)), 
        labels = levels(df$species),
        expand = c(0, .6),
        sec.axis = dup_axis(labels = labels_x_right)
      ) +
      coord_cartesian(clip = "off") +
      annotation_custom(
        grid::textGrob(
          expression(bold(underline("N"))),
        x = unit(1, "npc") + unit(20, "pt"),
        y = unit(1, "npc") + unit(4, "pt"),
        hjust = 0,
        vjust = 0,
        gp = grid::gpar(cex = 0.5)
      )) +
      ylab("") +
      xlab("") +
      theme(
        axis.text.y = element_text(size = 7, face = "italic", hjust = 0),
        axis.ticks.y.right = element_blank(),
        axis.ticks.length.y.right = unit(15, "pt"),
        axis.text.y.right = element_text(hjust = 0),
        axis.text.x = element_text(size = 7),
        axis.title.x = element_text(size = 9, face = "bold"),
        axis.line = element_line(colour = "black"),
        panel.background = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_rect(colour = "black", fill = NA, size = 1),
        panel.grid.major = element_line(colour = "#E0E0E0"),
        plot.title = element_text(hjust = 0.5)
      ) +
      theme(plot.margin = margin(21, 20, 20, 20))
    

    enter image description here