rxtszoo

Aggregate xts coredata in case of duplicate indices


Assuming I have the following xts object with duplicated time information:

library(xts)

x <- xts(1:5, 
         c("2024-04-19", "2024-04-19", "2024-04-20", "2024-04-21", "2024-04-21") |> as.Date())
x
#>            [,1]
#> 2024-04-19    1
#> 2024-04-19    2
#> 2024-04-20    3
#> 2024-04-21    4
#> 2024-04-21    5

Currently I'm simply discarding duplicate entries to "clean" the object in a rather naive way for further use/analysis:

ind <- zoo::index(x) |> duplicated()
x[!ind, ]
#>            [,1]
#> 2024-04-19    1
#> 2024-04-20    3
#> 2024-04-21    4

I would like to expand this towards a more sophisticated approach (at least from my point of view) where I would be able to choose some common aggregation function to be applied on duplicated indices, returning an object of class xts, e.g.

xts_aggr_duplicates(x, "mean")
#>            [,1]
#> 2024-04-19    1.5
#> 2024-04-20    3
#> 2024-04-21    4.5

xts_aggr_duplicates(x, "sum")
#>            [,1]
#> 2024-04-19    3
#> 2024-04-20    3
#> 2024-04-21    9

My idea was to disassemble the complete object, aggregate where necessary and rbind again... But this would be pretty inefficient for large objects, I guess. Any ideas?


Solution

  • Use aggregate.zoo. In the code below replace mean with whatever function you prefer.

    library(xts)
    aggregate(x, c, mean) |> as.xts()
    ##            [,1]
    ## 2024-04-19  1.5
    ## 2024-04-20  3.0
    ## 2024-04-21  4.5
    

    If the way you got x in the first place is reading it in from a file then use read.zoo.

    write.zoo(x, "myfile.dat") # create test file
    
    read.zoo("myfile.dat", aggregate = mean) |> as.xts()
    ##            [,1]
    ## 2024-04-19  1.5
    ## 2024-04-20  3.0
    ## 2024-04-21  4.5
    

    Note

    Input in reproducible form.

    library(xts)
    x <- xts(1:5, 
      as.Date(c("2024-04-19", "2024-04-19", "2024-04-20", "2024-04-21", "2024-04-21")))