rtime-seriesfable-r

Incorporating external regressor in a hierarchical/ grouped time series


Fellow contributors,

I have been working with a hierarchical time series, relating to a set of identical products in a number of stores. For this purpose when we aggregate the data set based on 2 attributes like "store" and "product_type" in my case, we should then aggregate the target variable which is "demand" for every individual product for every group or hierarchy. What I would like to do is add another categorical variable to my model let's say "Dynamic Harmonic Regression" as I am using weekly time series. However, I don't know how I should include it when my external variable is a categorical variable with 4 levels. I would like to know how I could aggregate this or whether there is anything I could do about it. Here you can find a small reproducible example:

library(tidyverse)
library(tsibble)
library(tsibbledata)
library(fable)
library(fabletools)
library(fpp3)
library(readxl)
library(fable.prophet)
library(feasts)

store <- c(rep('st1', 8), rep('st2', 8))
product_type <- c(rep('type1', 4), rep('type2', 4), rep('type1', 4), rep('type2', 4))
products <- c(rep('A', 2), rep('B', 2), rep('C', 2), rep('D', 2), 
              rep('A', 2), rep('B', 2), rep('C', 2), rep('D', 2))

demands <- c(round(sample(c(1:100), 16, replace = TRUE)))
external_reg <- c(sample(c('red', 'green', 'blue'), 16, replace = TRUE))
date_week <- rep(1:4, 4)
date_year <- rep(2019:2022, 4)

my_data <- tibble(date_year, date_week, store, product_type, products, demands, external_reg)

my_data %>%
  mutate(Date = ymd(paste0(date_year, "-01-01")) + weeks(date_week - 1)) %>%
  mutate(Week = yearweek(Date)) %>%
  as_tsibble(key = c(store, product_type), index = Week) %>%
  aggregate_key(store * product_type, Demand_Agg = sum(demands))

Clearly the external regeressor should be a column in my tsibble:

# A tsibble: 36 x 4 [53W]
# Key:       store, product_type [9]
       Week store        product_type Demand_Agg
     <week> <chr*>       <chr*>            <dbl>
 1 2019 W01 <aggregated> <aggregated>        188
 2 2020 W02 <aggregated> <aggregated>        142
 3 2021 W02 <aggregated> <aggregated>        259
 4 2022 W03 <aggregated> <aggregated>        186
 5 2019 W01 st1          <aggregated>         89
 6 2019 W01 st2          <aggregated>         99
 7 2020 W02 st1          <aggregated>         52
 8 2020 W02 st2          <aggregated>         90
 9 2021 W02 st1          <aggregated>         95
10 2021 W02 st2          <aggregated>        164
# … with 26 more rows

Thank you very much in advance.


Solution

  • The external regressor column (external_reg) has been dropped from your output because you have not specified how it should be aggregated. Given that it is a discrete variable, it might be tricky to aggregate the data in a way that keeps this information. How you choose to aggregate this is up to you, and may depend on the model you want to use. If you had a continuous variable like temperature, you may want to compute the average temperature.

    For example, if you wanted to keep the first value of external_reg you would aggregate it with aggregate_key(<tsibble>, store * product_type, Demand_Agg = sum(demands), external_reg = first(external_reg))

    library(dplyr)
    library(fable)
    library(tsibble)
    library(lubridate)
    
    my_data <- structure(list(date_year = c(2019L, 2020L, 2021L, 2022L, 2019L, 
    2020L, 2021L, 2022L, 2019L, 2020L, 2021L, 2022L, 2019L, 2020L, 
    2021L, 2022L), date_week = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 
    1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), store = c("st1", "st1", "st1", 
    "st1", "st1", "st1", "st1", "st1", "st2", "st2", "st2", "st2", 
    "st2", "st2", "st2", "st2"), product_type = c("type1", "type1", 
    "type1", "type1", "type2", "type2", "type2", "type2", "type1", 
    "type1", "type1", "type1", "type2", "type2", "type2", "type2"
    ), products = c("A", "A", "B", "B", "C", "C", "D", "D", "A", 
    "A", "B", "B", "C", "C", "D", "D"), demands = c(45, 12, 70, 66, 
    77, 6, 27, 52, 8, 73, 70, 27, 84, 100, 79, 51), external_reg = c("blue", 
    "green", "red", "blue", "green", "blue", "red", "green", "blue", 
    "blue", "green", "green", "red", "green", "blue", "green")), class = c("tbl_df", 
    "tbl", "data.frame"), row.names = c(NA, -16L))
    
    my_data %>%
      mutate(Date = ymd(paste0(date_year, "-01-01")) + weeks(date_week - 1)) %>%
      mutate(Week = yearweek(Date)) %>%
      as_tsibble(key = c(store, product_type), index = Week) %>%
      aggregate_key(store * product_type, Demand_Agg = sum(demands), external_reg = first(external_reg))
    #> # A tsibble: 36 x 5 [53W]
    #> # Key:       store, product_type [9]
    #>        Week store        product_type Demand_Agg external_reg
    #>      <week> <chr*>       <chr*>            <dbl> <chr>       
    #>  1 2019 W01 <aggregated> <aggregated>        214 blue        
    #>  2 2020 W02 <aggregated> <aggregated>        191 green       
    #>  3 2021 W02 <aggregated> <aggregated>        246 red         
    #>  4 2022 W03 <aggregated> <aggregated>        196 blue        
    #>  5 2019 W01 st1          <aggregated>        122 blue        
    #>  6 2019 W01 st2          <aggregated>         92 blue        
    #>  7 2020 W02 st1          <aggregated>         18 green       
    #>  8 2020 W02 st2          <aggregated>        173 blue        
    #>  9 2021 W02 st1          <aggregated>         97 red         
    #> 10 2021 W02 st2          <aggregated>        149 green       
    #> # … with 26 more rows
    

    Created on 2022-05-08 by the reprex package (v2.0.1)