rggplot2

Why does position_dodge() change the total value for ggplot?


I am confused why, if I leave in the position=position_dodge() in my code, my figure total values all change? If I remove it, the totals are correct but why would dodge affect the value number?

ggplot(data=check, aes(y=Abundance, x=Urban.Intensity)) +
  geom_bar(stat="identity", position=position_dodge()) + 
  theme_bw() +
  facet_wrap(~Genus, ncol=2) +
  scale_y_continuous(limits=c(0,1500)) +
  theme(text = element_text(size=14),
        axis.title.x = element_text(size=14, color="black"),
        axis.title.y = element_text(size=14, color="black"),
        axis.text.x = element_text(size=14, color="black"),
        axis.text.y = element_text(size=14, color="black"),
        strip.text = element_text(face="italic", size=12, color="black")) +
  ylab("Abundance") + xlab("Urbanization Intensity")

Figure WITH position_dodge(). Total Abundance values are not correct,

enter image description here

Figure WITHOUT position_dodge(). Total Abundance values are correct,

enter image description here


Solution

  • Since you don't include any data, I'll show an example using mtcars. Suppose we want to see the total weight of all the cars with a certain number of gears. This is equivalent to your goal of showing total Abundance.

    The default position for geom_col()/geom_bar() will stack observations with position_stack(), giving us the total weight. This will be clearer to see if we put a white outline on each plotted column. (a)

    We could apply position_dodge, but since ggplot2 assumes each observation is in the same "group" as the others, the dodging does not separate the observations. They are instead overplotted, which is visible if we make them partly transparent. (b)

    If we separate the observations into separate groups, we'll see more typical dodging behavior, which we can see results in the same maximum heights as the prior plot. We can explicitly assign group, or ggplot2 will do that for us if we have other aesthetics (like color or fill, etc.) mapped to observations or groups of observations. (c)

    enter image description here

    library(tidyverse)
    library(patchwork)
    a <- ggplot(mtcars, aes(gear, wt)) +
      geom_col(color = "white") +  # equivalent to  geom_bar(stat = "identity") 
      labs(title = "default position stacks\nobservations")
    
    b <- ggplot(mtcars, aes(gear, wt)) +
      geom_col(position = position_dodge(), alpha = 0.2) +
      labs(title = "dodged position puts them\noverlapped same spot")
    
    c <- ggplot(mtcars, aes(gear, wt, group = factor(wt))) +
      geom_col(position = position_dodge()) +
      labs(title = "dodged with different\ngroups per obs")
    
    a | b | c
    

    While it's possible to do many stat calculations within ggplot2 geoms, I often find it simpler to "roll my own" stats using dplyr. For instance, I might use the following to show a single bar and total label for each gear:

    library(dplyr)
    ggplot(mtcars |> count(gear, wt = wt), 
         # mtcars |> summarize(n = sum(wt), .by = gear),  # equivalent
           aes(gear, n, label = n)) +
      geom_col() +
      geom_text(vjust = -0.5)
    

    enter image description here