I want simple wrapped timeseries plots that colour code weekday/weekend line segments. Here's a reproducible example:
require(tidyverse)
set.seed(42)
# toy hourly dataset with daily and weekly cyclicality
dat = function(set){
times = seq.POSIXt(from = as.POSIXct("2025-07-01 00:00:00"), by = "hour", length.out = 24 * 14)
vals = rep(c(10,11,12,11,10,2,1), each = 24, length.out = length(times)) + rlnorm(length(times))
tibble(time = times, val = vals, set = set)
}
x = rbind(dat('A'), dat('B')) |>
mutate(weekend = lubridate::wday(time, week_start=2) > 5) |>
arrange(set, time) # ensure lag is correct
x |> ggplot(aes(time, val, xend = lag(time), yend = lag(val), group = set, col = weekend)) +
geom_segment() +
facet_wrap(~ set)
Set 'B' adds an unwanted line and I cannot figure out how to remove it. This is not in the data:
x |> filter(set == 'B') |>
ggplot(aes(time, val, xend = lag(time), yend = lag(val), group = set, col = weekend)) +
geom_segment() +
facet_wrap(~ set)
It looks like an artefact of the combination of facet_wrap
, geom_segment
and lag
functions. I've tried adjusting the group argument to group = paste(set, weekend)
but no effect. Any suggestions?
Edit: the weekend colouring is a distraction as this happens without col = weekend
arg - please just ignore that.
Edit2: I can workaround the problem by adding an extra data line at the start of each group with NA value. But there should in my view be a ggplot solution.
You are using lag
inside aes
on an ungrouped data frame sorted by set and date, so the first values calculated for xend
and yend
in set B will be the last values of time
and val
for set A. ggplot
is therefore just drawing exactly what you have told it to.
This is easy to see with a toy example:
dat <- data.frame(set = c("A", "A", "B", "B"),
time = c(1, 2, 1, 2),
val = c(6, 7, 8, 9))
dat
#> set time val
#> 1 A 1 6
#> 2 A 2 7
#> 3 B 1 8
#> 4 B 2 9
If we construct new variables using lag
, then the first row of the new variables in set B will be the last row of the unlagged variables from set A:
This means that if we take our original data frame and call lag
inside aes
as you have done, then we get an extra line starting from the first value of set B and joining to the last value of set A:
ggplot(dat, aes(time, val, xend = lag(time), yend = lag(val))) +
geom_segment() +
facet_grid(~set)
The obvious solution in your example is to group by "set" when you lag
the variables. This will make the xend
and yend
values for the first row in set B NA
, as they already are in set A.
x |>
mutate(time2 = lag(time), val2 = lag(val), .by = set) |>
ggplot(aes(time, val, xend = time2, yend = val2, col = weekend)) +
geom_segment() +
facet_wrap(~ set)
You will get two warnings about missing data due to the lag-induced NA values (currently you were only getting one from set A). You can silence these by adding na.rm = TRUE
inside geom_segment()