I have a time series of half-hourly observations. I want a running average of the range of each day's measurements. I group by date and get the correct daily ranges, but then everything I can think of to get the running average only works within the daily groups instead of across them -- row-by-row instead of day-by-day -- and since each day only has one range the value I get is always that range again. Here's one example:
library(tidyverse)
library(zoo)
set.seed(3)
dts <- sort(sample(seq(as_datetime("2024-08-15 09:00:00 EDT"), as_datetime("2024-08-22 09:00:00 EDT"), by="hour"), 24))
df <- tibble(dts = dts, temp = sample(10:20, 24, replace=TRUE))
df <- df %>%
mutate(date = as.Date(dts)) %>%
group_by(date) %>%
mutate(tMax = max(temp, na.rm = TRUE), tMin = min(temp, na.rm = TRUE)) %>%
mutate(range = tMax - tMin) %>%
mutate(rollRange = rollapply(range, 3, mean, fill=NA))
In the real data there are always many more observations each day than days in the rolling window so there are only NAs for the top of each day, and thereafter identical to range
. An added complication is that for other reasons there are a random number of rows per day so I can't just make my window obs/day * days
. Do I have to summarize()
this out to a separate data frame, only to merge it back in?
With apologies, here's the result of the reprex plus the column desired
showing something like what I'm looking for:
dts temp date tMax tMin range rollRange desired
<dttm> <int> <date> <int> <int> <int> <dbl> <dbl>
1 2024-08-15 13:00:00 17 2024-08-15 19 17 2 NA NA
2 2024-08-15 20:00:00 19 2024-08-15 19 17 2 NA NA
3 2024-08-16 02:00:00 20 2024-08-16 20 12 8 NA 4
4 2024-08-16 04:00:00 16 2024-08-16 20 12 8 8 4
5 2024-08-16 06:00:00 12 2024-08-16 20 12 8 8 4
6 2024-08-16 20:00:00 14 2024-08-16 20 12 8 8 4
7 2024-08-16 21:00:00 16 2024-08-16 20 12 8 NA 4
8 2024-08-17 00:00:00 15 2024-08-17 17 15 2 NA 5.33
9 2024-08-17 08:00:00 17 2024-08-17 17 15 2 NA 5.33
10 2024-08-18 06:00:00 19 2024-08-18 19 13 6 NA 4.33
11 2024-08-18 10:00:00 13 2024-08-18 19 13 6 NA 4.33
12 2024-08-19 16:00:00 10 2024-08-19 15 10 5 NA 6.66
13 2024-08-19 19:00:00 12 2024-08-19 15 10 5 5 6.66
14 2024-08-19 20:00:00 15 2024-08-19 15 10 5 NA 6.66
15 2024-08-20 06:00:00 13 2024-08-20 18 13 5 NA 6.66
16 2024-08-20 08:00:00 18 2024-08-20 18 13 5 NA 6.66
17 2024-08-21 00:00:00 19 2024-08-21 19 10 9 NA 6
18 2024-08-21 01:00:00 16 2024-08-21 19 10 9 9 6
19 2024-08-21 04:00:00 19 2024-08-21 19 10 9 9 6
20 2024-08-21 08:00:00 10 2024-08-21 19 10 9 9 6
21 2024-08-21 19:00:00 18 2024-08-21 19 10 9 9 6
22 2024-08-21 21:00:00 10 2024-08-21 19 10 9 NA 6
23 2024-08-22 05:00:00 18 2024-08-22 18 14 4 NA NA
24 2024-08-22 09:00:00 14 2024-08-22 18 14 4 NA NA
Yes, using rollmean
on the summary and joining back seems easiest.
roll <- df %>% # df has already been grouped
summarize(rollRange2 = first(range)) %>%
mutate(rollRange2 = rollmean(rollRange2, 3, fill = NA))
df %>%
left_join(roll, join_by(date))