I simulated a pure AR(2) process in R. When I run STL
(seasonal decomposition by Loess) from the feasts package, the function detects strong seasonality. The data-generating process does not have any seasonality.
How could I adjust the parameters in STL
to avoid detecting spurious seasonality.
library(fable)
library(fabletools)
library(tsibble)
library(tsibbledata)
library(lubridate)
library(dplyr)
library(tidyr)
library(feasts)
# simulate an ar2 process
set.seed(0)
sd_epilson = 0.5
data = rnorm(500)
for (i in 3:500){
data[i] = 10 + 0.5*data[i-1] + 0.3*data[i-2] + rnorm(1, 0, sd_epilson)
}
data = tail(data, 400)
ts_data = ts(data, start=c(1970,1), frequency=12) %>%
as_tsibble()
train = ts_data[1:360,]
test = ts_data[361:384,]
# model with STL()
train %>%
model(
stl = STL(value)
) %>%
components() %>%
autoplot()
My session info:
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22635)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] purrr_1.0.2 furrr_0.3.1 future_1.33.1 feasts_0.3.1 tidyr_1.3.0 dplyr_1.1.4
[7] lubridate_1.9.3 tsibbledata_0.4.1 tsibble_1.1.3 fable_0.3.3 fabletools_0.3.4
loaded via a namespace (and not attached):
[1] Rcpp_1.0.11 progressr_0.14.0 pillar_1.9.0 compiler_4.2.3 tools_4.2.3
[6] digest_0.6.33 lifecycle_1.0.4 tibble_3.2.1 gtable_0.3.4 anytime_0.3.9
[11] timechange_0.2.0 pkgconfig_2.0.3 rlang_1.1.2 cli_3.6.2 rstudioapi_0.15.0
[16] parallel_4.2.3 xfun_0.41 withr_2.5.2 knitr_1.45 globals_0.16.2
[21] generics_0.1.3 vctrs_0.6.5 rappdirs_0.3.3 grid_4.2.3 tidyselect_1.2.0
[26] glue_1.6.2 listenv_0.9.0 R6_2.5.1 future.apply_1.11.1 parallelly_1.36.0
[31] fansi_1.0.6 distributional_0.3.2 ggplot2_3.4.4 farver_2.1.1 magrittr_2.0.3
[36] codetools_0.2-19 scales_1.3.0 ellipsis_0.3.2 colorspace_2.1-0 labeling_0.4.3
[41] utf8_1.2.4 munsell_0.5.0
I misinterpreted the STL plots. The large vertical gray bar next to the seasonal component shows that the variation in the seasonal component is the smallest relative to the variation in the other components. The strength of the seasonality is less than 0.20.
train %>%
features(value, feat_stl)
STL is not detecting seasonality. You are asking it to model annual seasonality, so it does. If you just want to estimate a trend, then don't use STL.