Assuming I worked with meteorological observations and wanted to know not only the daily maximum value but also the relevant timestamp, describing when this value was observed, is it possible to accomplish this without significant overhead, e.g. by setting some sort of parameter?
library(xts)
set.seed(42)
# init data
datetimes <- seq(from = as.POSIXct("2022-01-01"),
to = as.POSIXct("2022-01-08"),
by = "10 mins")
values <- length(datetimes) |> runif() |> sin()
data <- xts(x = values,
order.by = datetimes)
#
ep <- endpoints(data, "days")
data_max <- period.apply(data, INDEX = ep, FUN = max)
head(data_max)
#> [,1]
#> 2022-01-01 23:50:00 0.8354174
#> 2022-01-02 23:50:00 0.8396034
#> 2022-01-03 23:50:00 0.8364624
#> 2022-01-04 23:50:00 0.8376930
#> 2022-01-05 23:50:00 0.8392988
#> 2022-01-06 23:50:00 0.8372780
Obviously, this would work well with summarizing functions like mean
and median
where you would want to specify the interval width considered, but when working with e.g. min
and max
, how would I proceed when I wanted to know the exact index of the value in question observed.
At the moment, I'm just looping over my xts subsets to determine the relevant index, but maybe there is a more elegant approach, maybe even an argument when using period.apply()
I haven't noticed to get this information.
sub <- "2022-01-04"
ind <- which(data[sub] == max(data[sub]))
data[sub][ind]
#> [,1]
#> 2022-01-04 23:20:00 0.837693
My desired output would look like this:
#> [,1]
#> 2022-01-01 03:40:00 0.8354174
#> 2022-01-02 15:00:00 0.8396034
#> 2022-01-03 05:10:00 0.8364624
#> 2022-01-04 23:20:00 0.8376930
#> 2022-01-05 02:50:00 0.8392988
#> 2022-01-06 06:40:00 0.8372780
1) Use ave
and subset that down to the rows with the maxima. The last line takes the first maximum in each day and could be omitted if there are no duplicate maxima within a day such as for a strictly increasing input or if all maxima were desired. The format
is to avoid time zone problems.
library(magrittr)
library(xts)
data %>%
subset(ave(., as.Date(format(time)), FUN = max) == .) %>%
subset(ave(., as.Date(format(time)), FUN = seq_along) == 1)
## [,1]
## 2022-01-01 03:40:00 0.8354174
## 2022-01-02 15:00:00 0.8396034
## 2022-01-03 05:10:00 0.8364624
## 2022-01-04 23:20:00 0.8376930
## 2022-01-05 02:50:00 0.8392988
## 2022-01-06 06:40:00 0.8372780
## 2022-01-07 08:40:00 0.8406546
## 2022-01-08 00:00:00 0.2335385
2) Another possibility is to aggregate using tapply with which.max to get the time of the first maximum in each date and then subset the data to those times.
data %>%
subset(time %in% tapply(time, as.Date(format(time)), \(x) x[which.max(.[x])] ))
## [,1]
## 2022-01-01 03:40:00 0.8354174
## 2022-01-02 15:00:00 0.8396034
## 2022-01-03 05:10:00 0.8364624
## 2022-01-04 23:20:00 0.8376930
## 2022-01-05 02:50:00 0.8392988
## 2022-01-06 06:40:00 0.8372780
## 2022-01-07 08:40:00 0.8406546
## 2022-01-08 00:00:00 0.2335385