rdatetime-seriesforecastingtsibble

Working with dates in tsibbles in R, "seasonal" and "fpp3" packages


I have been reading this book and been trying to use some of the same code, which require you to store your data as a tsibble. However when I try to use a specific code in chapter 3 on my own data, I can't get it to work.

First I load my own data, then I store it in a tsibble and the try to convert it to a tsibble (I don't know if this is the optimal way), but it seems that at some point along the way, I don't clarify what the quarterly dates are correctly.

#The packages i have 
library(seasonal)
library(tsibble)
library(feasts)
library(ggplot2)
library(fpp3)
library(dplyr)
library(lubridate)

#My data x=values and y=dates
x <- c(-4.2,3.1,-1.3,6.3,-6.5,2.6,-0.7,5.1,-6.5,4.2,-2.1,4.6,
       -5.2,1.8,-1.3,6.8,-5.3,3.6,-1.6,5.9,-6.7,6.9,-2.6,5,
       -4.4,6.2,-2.4,4.1,-5.2,2.6,-0.5,5.1,-6.1,3.5,-2.5,1.5,
       -6.6,0.7,-0.7,3.8,-4.6,3.9,0.2,3.5,-4.8,3.6,-2.2,4.4,
       -4.9,2.6,-1,3.4,-5,4.3,-1.2,3.5,-4.6,3,0.3,3.7,-4.2,
       3.9,-1.3,3.5,-4.1,5.9,-1.6,4.1,-4.1,4.8,-2.5,4.4,-4.8,
       5.1,-2.1,4.4,-4.4,4.4,-2,3.9,-5.7,-2.8,3.4,4.9,-5.5,7.3)

y <- c("2000 Q1",   "2000 Q2",  "2000 Q3",  "2000 Q4",  "2001 Q1",  "2001 Q2",  
       "2001 Q3",   "2001 Q4",  "2002 Q1",  "2002 Q2",  "2002 Q3",  "2002 Q4",  "2003 Q1",  
       "2003 Q2",   "2003 Q3",  "2003 Q4",  "2004 Q1",  "2004 Q2",  "2004 Q3",  "2004 Q4",  
       "2005 Q1",   "2005 Q2",  "2005 Q3",  "2005 Q4",  "2006 Q1",  "2006 Q2",  "2006 Q3",  
       "2006 Q4",   "2007 Q1",  "2007 Q2",  "2007 Q3",  "2007 Q4",  "2008 Q1",  "2008 Q2",  
       "2008 Q3",   "2008 Q4",  "2009 Q1",  "2009 Q2",  "2009 Q3",  "2009 Q4",  "2010 Q1",  
       "2010 Q2",   "2010 Q3",  "2010 Q4",  "2011 Q1",  "2011 Q2",  "2011 Q3",  "2011 Q4",  
       "2012 Q1",   "2012 Q2",  "2012 Q3",  "2012 Q4",  "2013 Q1",  "2013 Q2",  "2013 Q3",  
       "2013 Q4",   "2014 Q1",  "2014 Q2",  "2014 Q3",  "2014 Q4",  "2015 Q1",  "2015 Q2",  
       "2015 Q3",   "2015 Q4",  "2016 Q1",  "2016 Q2",  "2016 Q3",  "2016 Q4",  "2017 Q1",
       "2017 Q2",   "2017 Q3",  "2017 Q4",  "2018 Q1",  "2018 Q2",  "2018 Q3",  "2018 Q4",  
       "2019 Q1",   "2019 Q2",  "2019 Q3",  "2019 Q4",  "2020 Q1",  "2020 Q2",  "2020 Q3",  
       "2020 Q4",   "2021 Q1",  "2021 Q2")

#Convert to a tibble (i couldn't get the dates to work, so i created a sequence)
GDP <- tibble(
  GDPNumbers = x,
  Quarter = seq(as.Date("2000-01-01"), as.Date("2021-05-05"), by = "1 quarter")
)

#Convert to a tsibble
GDP_tsbl <- as_tsibble(GDP,
                       key = GDPNumbers,
                       index = Quarter) 

# Then i want to use two codes, which doesn't work for me
x11_dcmp <- GDP_tsbl %>%
  model(x11 = X_13ARIMA_SEATS(x ~ x11())) %>%
  components()

autoplot(x11_dcmp) + 
  labs(tittle =
         "Decomposition using X-11.")

When I get to the last two codes, I get error messages. Obviously the last one gives an error message because the previously didn't work.

Error: Problem with `mutate()` input `cmp`.
x no applicable method for 'components' applied to an object of class "null_mdl"
i Input `cmp` is `map(.fit, components)`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Advarselsbesked:
57 errors (4 unique) encountered for x11
[6] .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using `tsibble::fill_gaps()` if required.
[15] Internal error in `df_slice()`: Columns must match the data frame size.
[11] X-13 run failed

Errors:
- 1. Check input file and format.
- Time series could not be read due to previously found errors
- Specify series before user-defined adjustments
- Need to specify a series to identify outliers

Notes:
- Correct input errors in the order they are detected since the first one or two may be
  responsible for the others (especially if there are errors in the SERIES or COMPOSITE
  spec).
[25] X-13 run failed

Errors:
- Seasonal period must be 4 or 12 if a seasonal adjustment is done.

Notes:
- Correct input errors in the order they are detected since the first one or two may be
  responsible for the others (especially if there are errors in the SERIES or COMPOSITE
  spec).

As far as I can see R thinks that there are gaps in the data. I think this is because of something with the dates, I have looked around the internet, including stackoverflow but I haven't found any solution that has worked. On one site, it was suggested to use fill_gaps(.full = TRUE) but this ended up inserting around 440000 elements in my data which seems weird seeing that I only have 86 observations.

Any help would be appreciated.


Solution

  • You need to change a few things in your code.

    1. remove the key when you create the index. You want to predict on the gdp numbers. They are not a key value. The key will either be selected automatically or if you want to specify it, it needs to uniquely determine time indices.

    So for 1 you:

    GDP_tsbl <- as_tsibble(GDP,
                           index = Quarter)
    
    1. set the quarter correctly. You need to set the quarter with the function yearquarter.

      x11_dcmp <- GDP_tsbl %>% 
        mutate(Quarter = yearquarter(Quarter)) %>% 
        model(x11 = X_13ARIMA_SEATS(GDPNumbers ~ x11())) %>%
        components()
      

    These changes will make sure your autoplot will work as expected.