rtime-seriestsibble

struggling to regularize a time series using tsibble


I'm struggling to regularize a time series using the tsibble package. The documentation indicates that this can be done using index_by() + summarise(), but I'm clearly missing some details. Here's what I've tried:

library(tidyverse)
library(lubridate)
library(tsibble)

# example data set
date <- ymd(c("1976-05-18", "1976-05-19", "1976-05-24", "1976-06-01"))
fish <- c(203, 282, 301, 89)
volume <- c(210749, 287555, 378965, 308935)
n <- c(5, 7, 10, 8)
tbl <- tibble(date, fish, volume, n)
tsbl <- tsibble(tbl, index = date, regular = FALSE)
  
# regularize the tsibble (ie time series)
tsbl %>% 
  index_by(date, unit = "day") %>% # unit value "day" is intuitive but incorrect?
  mutate(week = isoweek(date)) %>% # add (numeric) week column
  summarise(date = date,
            fish = sum(fish),
            volume = sum(volume),
            n = sum(n), 
            cpue = fish/volume) # calculate catch per unit effort

TIA!


Solution

  • With so little information provided about what you are actually trying to do, I will have to guess.

    Perhaps you want daily data with each day explicitly included. In that case, do this:

    library(tidyverse)
    library(lubridate)
    library(tsibble)
    
    # example data set
    date <- ymd(c("1976-05-18", "1976-05-19", "1976-05-24", "1976-06-01"))
    fish <- c(203, 282, 301, 89)
    volume <- c(210749, 287555, 378965, 308935)
    n <- c(5, 7, 10, 8)
    tbl <- tibble(date, fish, volume, n)
    tsbl <- tsibble(tbl, index = date, regular = TRUE) %>%
      fill_gaps()
    tsbl
    #> # A tsibble: 15 x 4 [1D]
    #>    date        fish volume     n
    #>    <date>     <dbl>  <dbl> <dbl>
    #>  1 1976-05-18   203 210749     5
    #>  2 1976-05-19   282 287555     7
    #>  3 1976-05-20    NA     NA    NA
    #>  4 1976-05-21    NA     NA    NA
    #>  5 1976-05-22    NA     NA    NA
    #>  6 1976-05-23    NA     NA    NA
    #>  7 1976-05-24   301 378965    10
    #>  8 1976-05-25    NA     NA    NA
    #>  9 1976-05-26    NA     NA    NA
    #> 10 1976-05-27    NA     NA    NA
    #> 11 1976-05-28    NA     NA    NA
    #> 12 1976-05-29    NA     NA    NA
    #> 13 1976-05-30    NA     NA    NA
    #> 14 1976-05-31    NA     NA    NA
    #> 15 1976-06-01    89 308935     8
    

    Created on 2022-05-20 by the reprex package (v2.0.1)

    I'm not sure what you are trying to achieve with the summarize, but perhaps you want to create weekly data from these daily data. In that case, do this:

    tsbl %>% 
      mutate(week = isoweek(date)) %>% # add (numeric) week column
      index_by(week) %>%
      summarise(fish = sum(fish, na.rm=TRUE),
                volume = sum(volume, na.rm=TRUE),
                n = sum(n, na.rm=TRUE), 
                cpue = fish/volume) # calculate catch per unit effort
    #> # A tsibble: 3 x 5 [1]
    #>    week  fish volume     n     cpue
    #>   <dbl> <dbl>  <dbl> <dbl>    <dbl>
    #> 1    21   485 498304    12 0.000973
    #> 2    22   301 378965    10 0.000794
    #> 3    23    89 308935     8 0.000288
    

    Created on 2022-05-20 by the reprex package (v2.0.1)