rggplot2zigzag

Zig Zag when using geom_line with ggplot in R


I would really appreciate some insight on the zagging when using the following code in R:

tbi_military %>% 
ggplot(aes(x = year, y = diagnosed, color = service)) +
geom_line() +
facet_wrap(vars(severity))

The dataset is comprised of 5 variables (3 character, 2 numerical). Any insight would be so appreciated.

enter image description here


Solution

  • This is just an illustration with a standard dataset. Let's say we're interested in plotting the weight of chicks over time depending on a diet. We would attempt to plot this like so:

    library(ggplot2)
    
    ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
      geom_line()
    

    You can see the zigzag pattern appear, because per diet/time point, there are multiple observations. Because geom_line sorts the data depending on the x-axis, this shows up as a vertical line spanning the range of datapoints at that time per diet.

    The data has an additional variable called 'Chick' that separates out individual chicks. Including that in the grouping resolves the zigzag pattern and every line is the weight over time per individual chick.

    ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
      geom_line(aes(group = interaction(Chick, Diet)))
    

    If you don't have an extra variable that separates out individual trends, you could instead choose to summarise the data per timepoint by, for example, taking the mean at every timepoint.

    ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
      geom_line(stat = "summary", fun = mean)
    

    Created on 2021-08-30 by the reprex package (v1.0.0)