rdatetime-seriesaggregatetibbletime

Using the last day in each month of my time series in R


I need to use only the last day available in my dataset to aggregate later on but I didn´t have success...

library(tibbletime)
      
dataset <- data.frame(
  timestamp = c("2010-01-01", "2010-01-03", "2010-01-23")
  var =       c( 1,             4,            11)
)

monthly_dataset <- as_tbl_time(dataset, index = timestamp) %>%
                   as_period("1 month") 

How can I use some function or R package to aggregate my dataset only for using the last day avaiable ?


Solution

  • The answer from Julian is a nice start, but it won't work across multiple years because the grouping variable doesn't include information about the year.

    The typical way to do this is to group on year-month, and then filter to the max date per year-month group.

    Also, as the creator of tibbletime I would highly suggest that you no longer use it. It is deprecated and is no longer being supported. You should just use clock/lubridate for date handling alongside the tidyverse packages like dplyr, or you should use tsibble if you really need to go all in on time series.

    library(lubridate)
    library(dplyr)
    
    dataset <- tibble(
      timestamp = c(
        "2010-01-01", "2010-01-03", "2010-01-23", 
        "2010-02-01", "2010-02-03", "2011-02-23"
      ),
      var = c(1, 4, 11, 1, 4, 11)
    )
    dataset <- mutate(dataset, timestamp = ymd(timestamp))
    
    dataset <- dataset %>%
      mutate(
        year_month = floor_date(timestamp, "month"),
        day = day(timestamp)
      )
    
    dataset %>%
      group_by(year_month) %>%
      filter(day == max(day)) %>%
      ungroup()
    #> # A tibble: 3 × 4
    #>   timestamp    var year_month   day
    #>   <date>     <dbl> <date>     <int>
    #> 1 2010-01-23    11 2010-01-01    23
    #> 2 2010-02-03     4 2010-02-01     3
    #> 3 2011-02-23    11 2011-02-01    23
    

    Created on 2022-05-18 by the reprex package (v2.0.1)