I have two data frames that fill missing in different intervals. I would like to fill the two to the same interval. Consider two data frames with the same month-day but two years apart:
library(tidyverse)
library(fpp3)
df_2020 <- tibble(month_day = as_date(c('2020-1-1','2020-2-1','2020-3-1')),
amount = c(5, 2, 1))
df_2022 <- tibble(month_day = as_date(c('2022-1-1','2022-2-1','2022-3-1')),
amount = c(5, 2, 1))
These data frames both have three rows, with the same dates, 2 years apart.
Create tsibbles with a yearweek index:
ts_2020 <- df_2020 |> mutate(year_week = yearweek(month_day)) |>
as_tsibble(index = year_week)
ts_2022 <- df_2022 |> mutate(year_week = yearweek(month_day)) |>
as_tsibble(index = year_week)
ts_2020
#> # A tsibble: 3 x 3 [4W]
#> month_day amount year_week
#> <date> <dbl> <week>
#> 1 2020-01-01 5 2020 W01
#> 2 2020-02-01 2 2020 W05
#> 3 2020-03-01 1 2020 W09
ts_2022
#> # A tsibble: 3 x 3 [1W]
#> month_day amount year_week
#> <date> <dbl> <week>
#> 1 2022-01-01 5 2021 W52
#> 2 2022-02-01 2 2022 W05
#> 3 2022-03-01 1 2022 W09
Still three rows in each tsibble
Now fill gaps:
ts_2020_filled <- ts_2020 |> fill_gaps()
ts_2022_filled <- ts_2022 |> fill_gaps()
ts_2020_filled
#> # A tsibble: 3 x 3 [4W]
#> month_day amount year_week
#> <date> <dbl> <week>
#> 1 2020-01-01 5 2020 W01
#> 2 2020-02-01 2 2020 W05
#> 3 2020-03-01 1 2020 W09
ts_2022_filled
#> # A tsibble: 10 x 3 [1W]
#> month_day amount year_week
#> <date> <dbl> <week>
#> 1 2022-01-01 5 2021 W52
#> 2 NA NA 2022 W01
#> 3 NA NA 2022 W02
#> 4 NA NA 2022 W03
#> 5 NA NA 2022 W04
#> 6 2022-02-01 2 2022 W05
#> 7 NA NA 2022 W06
#> 8 NA NA 2022 W07
#> 9 NA NA 2022 W08
#> 10 2022-03-01 1 2022 W09
Here is the issue: ts_2020_filled has 4-weekly steps, and ts_2022_filled has 1-weekly steps. This is because the two tsibbles have different intervals:
tsibble::interval(ts_2020)
#> <interval[1]>
#> [1] 4W
tsibble::interval(ts_2022)
#> <interval[1]>
#> [1] 1W
This is because the tsibbles have different steps:
ts_2020 |>
pluck("year_week") |>
diff()
#> Time differences in weeks
#> [1] 4 4
ts_2022 |>
pluck("year_week") |>
diff()
#> Time differences in weeks
#> [1] 5 4
Therefore, the greatest common divisors are different (4 and 1). From the manual for as_tibble:
regular Regular time interval (TRUE) or irregular (FALSE). The interval is determined by the greatest common divisor of index column, if TRUE.
Both tsibbles are regular:
is_regular(ts_2020)
#> [1] TRUE
is_regular(ts_2020)
#> [1] TRUE
So, I would like to set the gap fill interval, so the periods are consistent. I tried setting .full in fill_gaps and .regular in as_tsibble. I could not find a way to set the interval of a tsibble. Is there a way of manually setting the interval used by fill_gaps? Granted an interval of four weeks won't work for df_2022, but the LCM of one would work for both.
I think you're looking for the new_interval()
function. A tsibble
has an interval
attribute that can be changed with new_interval()
.
# set the interval of ts_2020 to 1 week to match ts_2022
attr(ts_2020, 'interval') <- tsibble::new_interval(week = 1)
ts_2020 |>
tsibble::fill_gaps()
#> # A tsibble: 9 x 3 [1W]
#> month_day amount year_week
#> <date> <dbl> <week>
#> 1 2020-01-01 5 2020 W01
#> 2 NA NA 2020 W02
#> 3 NA NA 2020 W03
#> 4 NA NA 2020 W04
#> 5 2020-02-01 2 2020 W05
#> 6 NA NA 2020 W06
#> 7 NA NA 2020 W07
#> 8 NA NA 2020 W08
#> 9 2020-03-01 1 2020 W09
ts_2022 |>
tsibble::fill_gaps()
#> # A tsibble: 10 x 3 [1W]
#> month_day amount year_week
#> <date> <dbl> <week>
#> 1 2022-01-01 5 2021 W52
#> 2 NA NA 2022 W01
#> 3 NA NA 2022 W02
#> 4 NA NA 2022 W03
#> 5 NA NA 2022 W04
#> 6 2022-02-01 2 2022 W05
#> 7 NA NA 2022 W06
#> 8 NA NA 2022 W07
#> 9 NA NA 2022 W08
#> 10 2022-03-01 1 2022 W09
Created on 2023-04-10 with reprex v2.0.2