rggplot2geom-textposition-dodge

Why do some of the geom_text labels flipped when added on a dodged geom_col?


I have the following dataframe df:

df <- structure(list(ID = c(1, 1, 2, 2, 3, 3, 4, 4), Group = c("A", 
"B", "A", "B", "A", "B", "B", "A"), sumGp = c(1L, 0L, 162L, 32L, 
9L, 2L, 0L, 0L), n = c(2L, 30L, 181L, 60L, 27L, 17L, 33L, 3L), 
    pct = c(0.5, 0, 0.895027624309392, 0.533333333333333, 0.333333333333333, 
    0.117647058823529, 0, 0)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))   

I want to visualize this with dodged geom_col, adding labels to it only when sumGp is not zero. My trick is to color the text white when sumGp == 0 (I understand this might not be the best way to label the numbers, but let's just do it this way to reproduce my main problem in this question).

Let's set up a ggplot function to make sure my problem is not related to accidentally changed codes:

library(ggplot2)
library(dplyr)
library(scales)

plot_geom_text_with_dodge_geom_col <- function(df){
  ggplot(df |> mutate(label_pct = paste0(sumGp, "/", n)), 
         aes(as.character(ID), pct, fill = Group, label = label_pct)) + 
    geom_col(position = "dodge", col = "black") +
    geom_text(aes(color = grepl("^0/", label_pct)), position = position_dodge(0.9), vjust = -0.5,
              show.legend = F) +
    scale_fill_manual(name = "Identity", values = c("A" = "#4A765e", "B" = "orange3")) +
    scale_color_manual(name = "Identity", values = c("black", "white")) +
    scale_y_continuous(label = scales::percent) +
    labs(x = "ID") +
    theme_bw() +
    theme(panel.grid = element_blank(),
          panel.border = element_rect(color = "black", linewidth = 1),
          plot.tag = element_text(face = "bold"),
          legend.title = element_text(face = "bold", size = 15),
          legend.text = element_text(size = 12),
          axis.title = element_text(face = "bold", size = 15),
          axis.text = element_text(size = 12, face = "bold"),
          legend.position = "top")
  }

Everything goes well with this code:

plot_geom_text_with_dodge_geom_col(df)

But when the values of "A" and "B" flipped, the behavior of geom_text becomes weird, where position_dodge does not seem to affect the text on the dodged bar if one of them contained zero:

df2 <- df |> mutate(Group = case_match(Group, "A" ~ "B", "B" ~ "A"))

plot_geom_text_with_dodge_geom_col(df2)

Does anyone know what is happening? Someone suggested to add aes(group = Group) to fix the problem, but that did not answer why ploting with df is fine without grouping. Also, that did not answer why only the pair with sumGp == 0 was affected without grouping (in df2).


Solution

  • The issue is the grouping and can be fixed by mapping on the group= aes. I haven't digged deeper into your code. But especially when several variables and aesthetics are involved I would recommend to map on aesthetics locally and/or to explicitly map on the group= aes so that bars and text or ... are dodged by the same variable.

    library(ggplot2)
    library(dplyr)
    library(scales)
    
    plot_geom_text_with_dodge_geom_col <- function(df) {
      ggplot(
        df |> mutate(label_pct = paste0(sumGp, "/", n)),
        aes(as.character(ID), pct, fill = Group, label = label_pct, group = Group)
      ) +
        geom_col(position = "dodge", col = "black") +
        geom_text(aes(color = grepl("0/", label_pct)),
          position = position_dodge(0.9), vjust = -0.5,
          show.legend = FALSE
        ) +
        scale_fill_manual(name = "Identity", values = c("A" = "#4A765e", "B" = "orange3")) +
        scale_color_manual(name = "Identity", values = c("black", "white")) +
        scale_y_continuous(label = scales::percent) +
        labs(x = "ID") +
        theme_bw() +
        theme(
          panel.grid = element_blank(),
          panel.border = element_rect(color = "black", linewidth = 1),
          plot.tag = element_text(face = "bold"),
          legend.title = element_text(face = "bold", size = 15),
          legend.text = element_text(size = 12),
          axis.title = element_text(face = "bold", size = 15),
          axis.text = element_text(size = 12, face = "bold"),
          legend.position = "top"
        )
    }
    
    df2 <- df |> mutate(Group = case_match(Group, "A" ~ "B", "B" ~ "A"))
    
    plot_geom_text_with_dodge_geom_col(df2)
    

    Why does this happen?

    The underlying issue is the grouping or more precisely how the group variable is set internally by ggplot2. As document in several places, e.g. in the docs the group variable is set using plyr::id() (see here) based on all discrete variables mapped on aesthetics with the label aes (and the facetting) variables being the only exceptions. Additionally, it's important to note that the value set for the group variable also depends on the order of the aesthetics inside aes().

    Before I go on, to show clearly what's going on I slightly changed the setup of your example, i.e. I added label_pct and a color column to the original dataset outside of your plotting function.

    df <- df |>
      dplyr::mutate(
        label_pct = paste0(sumGp, "/", n),
        color = grepl("0/", label_pct)
      )
    
    plot_geom_text_with_dodge_geom_col2 <- function(df) {
      ggplot(
        df,
        aes(as.character(ID), pct,
          fill = Group, label = label_pct,
          # group = Group
        )
      ) +
        geom_col(position = "dodge", col = "black") +
        geom_text(aes(color = color),
          position = position_dodge(0.9),
          vjust = -0.5,
          show.legend = FALSE
        ) +
        scale_fill_manual(name = "Identity", values = c("A" = "#4A765e", "B" = "orange3")) +
        scale_color_manual(name = "Identity", values = c("black", "red")) +
        scale_y_continuous(label = scales::percent) +
        labs(x = "ID") +
        theme_bw() +
        theme(
          panel.grid = element_blank(),
          panel.border = element_rect(color = "black", linewidth = 1),
          plot.tag = element_text(face = "bold"),
          legend.title = element_text(face = "bold", size = 15),
          legend.text = element_text(size = 12),
          axis.title = element_text(face = "bold", size = 15),
          axis.text = element_text(size = 12, face = "bold"),
          legend.position = "top"
        )
    }
    
    df2 <- df |> 
      mutate(Group = case_match(Group, "A" ~ "B", "B" ~ "A"))
    
    p1 <- plot_geom_text_with_dodge_geom_col2(df)
    p2 <- plot_geom_text_with_dodge_geom_col2(df2)
    

    For the geom_col the group is set according to ID and Group aka the variables mapped on x and fill. For your example this also means that each observation gets assigned to its own group. In contrast, for the geom_text the group also accounts for the variable mapped on color. As a consequence, already for df does the grouping differ for the geom_col and the geom_text. This can be seen by calling plyr::id() and checked using e.g. layer_data:

    plyr::id(df[c("ID", "Group")], drop = TRUE)
    #> [1] 1 2 3 4 5 6 8 7
    #> attr(,"n")
    #> [1] 8
    
    layer_data(p1, i = 1)[["group"]]
    #> [1] 1 2 3 4 5 6 8 7
    
    plyr::id(df[c("color", "ID", "Group")], drop = TRUE)
    #> [1] 1 6 2 3 4 5 8 7
    #> attr(,"n")
    #> [1] 8
    
    layer_data(p1, i = 2)[["group"]]
    #> [1] 1 6 2 3 4 5 8 7
    

    This said, even for df are the labels assigned to the right columns only by coincidence.

    Now, when looking at df2 we see that for the geom_col the values assigned to the group simply get swapped when swapping A and B. Hence, the bars are swapped too.

    plyr::id(df2[c("ID", "Group")], drop = TRUE)
    #> [1] 2 1 4 3 6 5 7 8
    #> attr(,"n")
    #> [1] 8
    

    However, this is not the case for the geom_text layer. Here, the values assigned to group are swapped for all rows except for the first two. As a result the grouping for the first two rows or labels is the same as in the case of df and the labels are no longer assigned or aligned to the correct bars.

    plyr::id(df2[c("color", "ID", "Group")], drop = TRUE)
    #> [1] 1 6 3 2 5 4 7 8
    #> attr(,"n")
    #> [1] 8