I want to create a bar chart that has a year, value and a category dimension. The x-axis should be the different years and within a single year, I want the bars to be in ascending order.
I manage to do it, but when I try to change scale_y_continuous-parameter, all hell breaks loose and no graph is displayed.
Below, I demonstrate the problem with sample data. The first graph is exactly what I want, except for the y-axis. I want to be able to adjust the y-axis (ticks, text, etc.). But as said, when I try to adjust it, the code stops working.
Sample code
# Example data
df <- data.frame(type = c("cat1", "cat1", "cat2", "cat2"),
year = c(1,2,1,2),
val = c(100,70,60,100))
library(ggplot2)
# basic plot works
ggplot(df, aes(x = as.factor(year), y = reorder(val, as.factor(year)), fill = type)) +
geom_bar(stat = "identity", position=position_dodge(0.7), width = 0.6)
# doesnt work... why??
ggplot(df, aes(x = as.factor(year), y = reorder(val, as.factor(year)), fill = type)) +
geom_bar(stat = "identity", position=position_dodge(0.7), width = 0.6) +
scale_y_continuous(
expand = expansion(mult = c(0.03, 0.11)),
breaks = seq(0, 100, by = 10),
limits = c(0, max(df$val, na.rm = FALSE) + 10) )
Your original plot is treating val
as a factor, which is quite unusual/strange: the values are internally being converted to integer values {1, 2, 3}, which is what's actually being plotted along with the factor labels {60, 70, 100}. This means that the distance between 60 and 70 on the y-axis is the same as the distance between 70 and 100 — a strange graphical design decision at best, and misleading at worst:
If, as @stefan suggests, you use your second bit of code with y=val
rather than making y
into a factor, and use tidyverse tools to make a new variable that defines your ordering, you can get something more sensible ...
library(ggplot2)
library(dplyr)
df2 <- df |> arrange(year, val) |> mutate(group = forcats::fct_inorder(paste0(year, type)))
ggplot(df2, aes(x = as.factor(year), y = val, fill = type, group = group)) +
geom_bar(stat = "identity", position=position_dodge(0.7), width = 0.6) +
scale_y_continuous(
expand = expansion(mult = c(0.03, 0.11)),
breaks = seq(0, 100, by = 10),
limits = c(0, max(df$val, na.rm = FALSE) + 10) )