rxtszoo

Is it possible to get the specific index of occurence when working with period.apply?


Assuming I worked with meteorological observations and wanted to know not only the daily maximum value but also the relevant timestamp, describing when this value was observed, is it possible to accomplish this without significant overhead, e.g. by setting some sort of parameter?

library(xts)

set.seed(42)

# init data
datetimes <- seq(from = as.POSIXct("2022-01-01"), 
                 to = as.POSIXct("2022-01-08"),
                 by = "10 mins")

values <- length(datetimes) |> runif() |> sin()

data <- xts(x = values,
            order.by = datetimes)

# 
ep <- endpoints(data, "days")

data_max <- period.apply(data, INDEX = ep, FUN = max)

head(data_max)
#>                          [,1]
#> 2022-01-01 23:50:00 0.8354174
#> 2022-01-02 23:50:00 0.8396034
#> 2022-01-03 23:50:00 0.8364624
#> 2022-01-04 23:50:00 0.8376930
#> 2022-01-05 23:50:00 0.8392988
#> 2022-01-06 23:50:00 0.8372780

Obviously, this would work well with summarizing functions like mean and median where you would want to specify the interval width considered, but when working with e.g. min and max, how would I proceed when I wanted to know the exact index of the value in question observed.

At the moment, I'm just looping over my xts subsets to determine the relevant index, but maybe there is a more elegant approach, maybe even an argument when using period.apply() I haven't noticed to get this information.

sub <- "2022-01-04"

ind <- which(data[sub] == max(data[sub]))

data[sub][ind]
#>                         [,1]
#> 2022-01-04 23:20:00 0.837693

My desired output would look like this:

#>                          [,1]
#> 2022-01-01 03:40:00 0.8354174
#> 2022-01-02 15:00:00 0.8396034
#> 2022-01-03 05:10:00 0.8364624
#> 2022-01-04 23:20:00 0.8376930
#> 2022-01-05 02:50:00 0.8392988
#> 2022-01-06 06:40:00 0.8372780

Solution

  • 1) Use ave and subset that down to the rows with the maxima. The last line takes the first maximum in each day and could be omitted if there are no duplicate maxima within a day such as for a strictly increasing input or if all maxima were desired. The format is to avoid time zone problems.

    library(magrittr)
    library(xts)
    
    data %>%
      subset(ave(., as.Date(format(time)), FUN = max) == .) %>%
      subset(ave(., as.Date(format(time)), FUN = seq_along) == 1)
    ##                          [,1]
    ## 2022-01-01 03:40:00 0.8354174
    ## 2022-01-02 15:00:00 0.8396034
    ## 2022-01-03 05:10:00 0.8364624
    ## 2022-01-04 23:20:00 0.8376930
    ## 2022-01-05 02:50:00 0.8392988
    ## 2022-01-06 06:40:00 0.8372780
    ## 2022-01-07 08:40:00 0.8406546
    ## 2022-01-08 00:00:00 0.2335385
    

    2) Another possibility is to aggregate using tapply with which.max to get the time of the first maximum in each date and then subset the data to those times.

    data %>%
      subset(time %in% tapply(time, as.Date(format(time)), \(x) x[which.max(.[x])] ))
    
    ##                          [,1]
    ## 2022-01-01 03:40:00 0.8354174
    ## 2022-01-02 15:00:00 0.8396034
    ## 2022-01-03 05:10:00 0.8364624
    ## 2022-01-04 23:20:00 0.8376930
    ## 2022-01-05 02:50:00 0.8392988
    ## 2022-01-06 06:40:00 0.8372780
    ## 2022-01-07 08:40:00 0.8406546
    ## 2022-01-08 00:00:00 0.2335385