rggplot2time-seriesfacet-wraptimeserieschart

ggplot unwanted line when combining facet_wrap with geom_segment for timeseries data


I want simple wrapped timeseries plots that colour code weekday/weekend line segments. Here's a reproducible example:

require(tidyverse)
set.seed(42)

# toy hourly dataset with daily and weekly cyclicality
dat = function(set){
  times = seq.POSIXt(from = as.POSIXct("2025-07-01 00:00:00"), by = "hour", length.out = 24 * 14)
  vals = rep(c(10,11,12,11,10,2,1), each = 24, length.out = length(times)) + rlnorm(length(times))
  tibble(time = times, val = vals, set = set)
}

x = rbind(dat('A'), dat('B')) |>
  mutate(weekend = lubridate::wday(time, week_start=2) > 5) |>
  arrange(set, time) # ensure lag is correct

x |> ggplot(aes(time, val, xend = lag(time), yend = lag(val), group = set, col = weekend)) + 
  geom_segment() + 
  facet_wrap(~ set)

enter image description here

Set 'B' adds an unwanted line and I cannot figure out how to remove it. This is not in the data:

x |> filter(set == 'B') |>
  ggplot(aes(time, val, xend = lag(time), yend = lag(val), group = set, col = weekend)) + 
  geom_segment() + 
  facet_wrap(~ set)

enter image description here

It looks like an artefact of the combination of facet_wrap, geom_segment and lag functions. I've tried adjusting the group argument to group = paste(set, weekend) but no effect. Any suggestions?

Edit: the weekend colouring is a distraction as this happens without col = weekend arg - please just ignore that.

Edit2: I can workaround the problem by adding an extra data line at the start of each group with NA value. But there should in my view be a ggplot solution.


Solution

  • You are using lag inside aes on an ungrouped data frame sorted by set and date, so the first values calculated for xend and yend in set B will be the last values of time and val for set A. ggplot is therefore just drawing exactly what you have told it to.

    This is easy to see with a toy example:

    dat <- data.frame(set = c("A", "A", "B", "B"),
                      time = c(1, 2, 1, 2),
                      val = c(6, 7, 8, 9))
    
    dat
    #>   set time val
    #> 1   A    1   6
    #> 2   A    2   7
    #> 3   B    1   8
    #> 4   B    2   9
    

    If we construct new variables using lag, then the first row of the new variables in set B will be the last row of the unlagged variables from set A:

    enter image description here

    This means that if we take our original data frame and call lag inside aes as you have done, then we get an extra line starting from the first value of set B and joining to the last value of set A:

    ggplot(dat, aes(time, val, xend = lag(time), yend = lag(val))) +
      geom_segment() +
      facet_grid(~set)
    

    enter image description here

    The obvious solution in your example is to group by "set" when you lag the variables. This will make the xend and yend values for the first row in set B NA, as they already are in set A.

    x |> 
      mutate(time2 = lag(time), val2 = lag(val), .by = set) |>
      ggplot(aes(time, val, xend = time2, yend = val2, col = weekend)) + 
      geom_segment() + 
      facet_wrap(~ set)
    

    enter image description here

    You will get two warnings about missing data due to the lag-induced NA values (currently you were only getting one from set A). You can silence these by adding na.rm = TRUE inside geom_segment()