rtime-seriespredictionhidden-markov-models# How to Make a One-Step-Ahead Prediction for Observed State Using hmmTMB in R?

## Forecast mean against time

## Forecast distribution

## Code

## Edit

I've been working with the hmmTMB package in R to model and predict oil price movements using a Hidden Markov Model (HMM). After fitting the model with training data, I'm struggling with making one-step-ahead predictions for the observed state (e.g., oil prices) using the fitted model on a test dataset.

```
# Load necessary libraries
library(hmmTMB)
library(tidyverse)
# Set seed for reproducibility
set.seed(123)
# Example dataset preparation (extended for 1 year)
df <- data.frame(
day = seq.Date(from = as.Date("2020-01-01"), to = as.Date("2020-12-31"), by = "day"),
OILPRICE = rnorm(366, 100, 10)
) %>%
as_tibble()
# Splitting into training and test sets
split_index <- round(nrow(df) * 0.8)
training_set <- df[1:split_index, ] %>% as.data.frame()
testing_set <- df[(split_index + 1):nrow(df), ] %>% as.data.frame()
# Define a hidden Markov model with 2 states for the training set
hid1 <- MarkovChain$new(data = training_set, n_states = 2)
dists <- list(OILPRICE = "norm")
par0 <- list(OILPRICE = list(mean = c(110, 90), sd = c(1, 1)))
obs1 <- Observation$new(data = training_set, n_states = 2, dists = dists, par = par0)
par0 <- obs1$suggest_initial()
obs1 <- Observation$new(data = training_set, n_states = 2, dists = dists, par = par0)
hmm1 <- HMM$new(obs = obs1, hid = hid1)
hmm1$fit(silent = TRUE)
# Rename
data_plot <- training_set
# Color by most likely state sequence
data_plot$state <- factor(paste0("State ", hmm1$viterbi()))
ggplot(data_plot, aes(day, OILPRICE, col = state)) +
geom_point() +
scale_color_manual(values = pal, name = NULL)
# Attempting to make a one-step-ahead prediction
# ??? This is where I need guidance
```

How can I use hmmTMB to make a one-step-ahead prediction for the observed state (e.g., OILPRICE on the next day) using the fitted model (hmm1) on my test dataset (testing_set)? I am looking for a way to predict future OILPRICE values based on the model fitted to the training set.

I appreciate any insights or examples on how to achieve this with the hmmTMB package.

Solution

There are currently no built-in functions for forecasting in hmmTMB (as of February 2024). A forecasting distribution can however be computed from a fitted hmmTMB model.

The forecast distribution is a mixture of the state-dependent distributions, where the weights correspond to the probability of being in the different states. So, you can use the following workflow:

- get state distribution for last observed data row (i.e., at time n); let's call this
`u`

- get state distribution at time n + h as
`u %*% (tpm %^% h)`

, where`tpm`

is the (one-step) transition probability matrix of the state process; that is, we multiply`u`

by the h-step transition probability matrix - get forecast distribution as a mixture of the estimated state-dependent distributions, weighted by the state probabilities found in the previous step

In step 3, you could for example compute the forecast mean (as a weighted sum of the state-dependent means), or the probability density function of the forecast distribution as shown in the code below.

The black dots show the mean of the forecast distribution for 60 days after the last observation of the training set.

The lines show the probability density functions of the forecast distribution for the 60 days following the last observation of the training set. As you can see, it's very likely to be in state 2 at first, so the corresponding state-dependent distribution has a higher weight. After a while, as the state distribution approaches the stationary (long-term) distribution, the forecast distribution stops changing.

```
# Load packages
library(scico)
library(expm)
# Grid of oil prices to plot forecast distributions
grid <- seq(min(training_set$OILPRICE),
max(training_set$OILPRICE),
length = 100)
# Get state distribution at last (training) observation
sp <- hmm1$state_probs()
u <- sp[nrow(sp),]
# Get model parameters
tpm <- hid1$tpm()[,,1]
obspar <- obs1$par()[,,1]
# Loop over forecast times (over 60 days here)
h <- 1:60
mix <- list()
mix_mean <- rep(NA, length = length(h))
for(i in seq_along(h)) {
# State distribution at time n + h
dist_h <- u %*% (tpm %^% h[i])
# Forecast distribution at time n + h
mix[[i]] <-
dist_h[1] * dnorm(x = grid,
mean = obspar["OILPRICE.mean", "state 1"],
sd = obspar["OILPRICE.sd", "state 1"]) +
dist_h[2] * dnorm(x = grid,
mean = obspar["OILPRICE.mean", "state 2"],
sd = obspar["OILPRICE.sd", "state 2"])
# Mean of forecast distribution
mix_mean[i] <- sum(dist_h * obspar["OILPRICE.mean",])
}
# Plot the forecast mean against time
df_mean <- data.frame(day = seq.Date(from = as.Date("2020-10-20"),
by = "day", length = length(h)),
mean = mix_mean)
ggplot(data_plot, aes(day, OILPRICE, col = state)) +
geom_point() +
geom_point(aes(y = mean, col = NULL), data = df_mean) +
scale_color_manual(values = pal)
# Plot the forecast distributions for h = 1, 2, ..., 60
df_mix <- data.frame(h = rep(h, each = 100),
price = grid,
mix = unlist(mix))
ggplot(df_mix, aes(price, mix, col = h, group = h)) +
geom_line() +
scale_color_scico(name = "time") +
labs(y = "forecast distribution")
```

Note that I used a slightly different simulated data set, generated from a 2-state hidden Markov model, to create the kind of patterns we would expect to find in real data. The data can be simulated as follows:

```
# Set seed for reproducibility
set.seed(123)
# Generate state process for simulated data
markov <- rep(1, 366)
for(i in 2:366) {
switch <- sample(0:1, size = 1, prob = c(0.95, 0.05))
if(switch) markov[i] <- 3 - markov[i-1]
else markov[i] <- markov[i-1]
}
# Example dataset preparation (extended for 1 year)
df <- data.frame(
day = seq.Date(from = as.Date("2020-01-01"),
to = as.Date("2020-12-31"),
by = "day"),
# Generate observations conditionally on state process
OILPRICE = rnorm(366, c(100, 140)[markov], 10)
)
```

- Retrieving expected data.frame for testthat expectation
- How to use an anonymous function with pipe operator?
- R passing RSelenium driver environment as function argument
- Keep file names merging list with lapply?
- replace strings that contains one another
- Stop parsing out zeros after decimals in ggplot2's annotate
- Plot data.tree coloring and labelling by level
- Extracting the date and time out of a date and time format
- How to reshape a dataframe into wide format with specified column pairs
- Show partial segments removed after limit scales for `geom_pointrange`
- Control ggplot2 legend look without affecting the plot
- Concatenate strings over all rows in a single column of a data frame
- Is there some way to keep variable names from.SD+.SDcols together with non .SD variable names in data.table?
- How to create mutually dependent checkbox columns in a DT table?
- A problem with an stacked, grouped and faceted graph y gg plot
- Installing R on Linux: configure: error: libcurl >= 7.28.0 library and headers are required with support for https
- Create a column conditional on multiple other character and numerical columns
- data.frame to array (2 columns)
- Changing options() in a function environment without changing options() in global environment in R?
- pivot(), group_by() and summarise() with nested data
- Why legend is not showing up in plot layout using patchwork in R?
- in dplyr::mutate, dplyr::starts_with works for .before but not .after?
- Finding optimal cut-off points for dividing a variable into equal groups considering ties in which quantiles do not work
- How to add a legend to hline?
- R Shiny Suppress slider handle for sliderInput until click
- brmultinom brglm2 vifs and dredge dont work
- R, how to get current time miliseconds, without extra information ? (Sys.time(), format("%X"))
- Installing multiple versions of R
- Command to see 'R' path that RStudio is using
- Fast pairwise longest common substring from start