I'm trying to view the out of sample performance scores after running fable prophet. Please note, the forecast is grouped based on type and the forecast is looking 5 observations ahead.
Here is the code:
library(tibble)
library(tsibble)
library(fable.prophet)
lax_passengers <- read.csv("https://raw.githubusercontent.com/mitchelloharawild/fable.prophet/master/data-raw/lax_passengers.csv")
library(dplyr)
library(lubridate)
lax_passengers <- lax_passengers %>%
mutate(datetime = mdy_hms(ReportPeriod)) %>%
group_by(month = yearmonth(datetime), type = Domestic_International) %>%
summarise(passengers = sum(Passenger_Count)) %>%
ungroup()
lax_passengers <- as_tsibble(lax_passengers, index = month, key = type)
fit <- lax_passengers %>%
model(
mdl = prophet(passengers ~ growth("linear") + season("year", type = "multiplicative")),
)
fit
test_tr <- lax_passengers %>%
slice(1:(n()-5)) %>%
stretch_tsibble(.init = 12, .step = 1)
fc <- test_tr %>%
model(
mdl = prophet(passengers ~ growth("linear") + season("year", type = "multiplicative")),
) %>%
forecast(h = 5)
fc %>% accuracy(lax_passengers)
When I run fc %>% accuracy(lax_passenger)
, I get the following warning:
Warning message:
The future dataset is incomplete, incomplete out-of-sample data will be treated as missing.
5 observations are missing between 2019 Apr and 2019 Aug
How do make the future dataset complete as I believe the performance score isn't accurate based on the missing 5 observations.
It seems like when I try to stretch the tsibble, it doesn't slice correctly as it doesn't remove the last 5 observations from each type.
The slice()
function removes rows from the entire dataset, so it is only removing the last 5 rows from your last key (type=="International"
). To remove the last 5 rows from all keys, you'll need to group by keys and slice.
test_tr <- lax_passengers %>%
group_by_key() %>%
slice(1:(n()-5)) %>%
ungroup() %>%
stretch_tsibble(.init = 12, .step = 1)