My data structure is in the image below and has hourly intervals. I need to forecast the Demand.
# A tsibble: 23,400 x 6 [1h] <UTC>
Date Demand WeekDay DaysAfterHoliday Influenza MAX_Temperature
<dttm> <int> <int> <int> <dbl> <dbl>
1 2017-05-01 00:00:00 122 1 0 1 19.2
2 2017-05-02 01:00:00 124 2 1 3.04 25.3
...
I know that in a day after a holiday the number of patients in the ED is higher than usual but I can't make sure that the model is taking that into account. The data has daily, weekly and annual seasonality (especially for fixed holidays).
For multiple seasonality I can use FASSTER
to handle holiday effects. I read the r documentation page on this and some presentation but in those cases the seasonality and the formula of the forecast is given to the function like this:
# NOT RUN {
cbind(mdeaths, fdeaths) %>%
as_tsibble %>%
model(FASSTER(mdeaths ~ fdeaths + poly(1) + trig(12)))
# }
Is there a way to make FASSTER
search the most adequate formula? If not how can I know which is the best approach?
Thank you in advance!
The fasster package currently doesn't provide any facilities for automatic model selection (https://github.com/tidyverts/fasster/issues/50).
To identify an appropriate fasster model specification, you can start by graphically exploring your data to identify its structure. Some questions you may consider include:
fourier(period, K)
or season(period)
. Generally using fourier()
terms are better, as being able to specify the number of harmonics (K
) allows you to control the smoothness of the seasonality and reduce model parameters.poly(1)
or a trend with poly(2)
.lm()
.%S%
to switch between these patterns. For example to have a different seasonal pattern for weekdays and weekends you may consider day_type %S% (fourier("day", K = 7))
, where day_type
is a variable in your model that specifies if the day is a weekday or weekend.A simple approach to capturing the increase in patients after a holiday would be to include DaysAfterHoliday
as an exogenous regressor. As this relationship is likely non-linear, you may need to also include some non-linear transformations of this variable as exogenous regressors.