pythonhmmlearn

How does the HMM model in hmmlearn identifies the hidden states


I am new to Hidden Markov Models, and to experiment with it I am studying the scenario of sunny/rainy/foggy weather based on the observation of a person carrying or not an umbrella, with the help of the hmmlearn package in Python. The data used in my tests was obtained from this page (the test and output files of "test 1").

I created the simple code presented bellow to fit an unsupervised HMM from the test data, and then compared the prediction to the expected output. The results seem pretty good (7 out of 10 correct predictions).

My question is: how am I supposed to know the mapping of the hidden states handled by the model to the real states in the problem domain? (in other words, how do I relate the responses to the desired states of my problem domain?)

This might be a very naïve question, but if the model was supervised I would understand that the mapping is given by me when providing the Y values for the fit method... yet I simply can't figure out how it works in this case.

Code:

import numpy as np
from hmmlearn import hmm

# Load the data from a CSV file
data = np.genfromtxt('training-data.csv', skip_header=1, delimiter=',',
                         dtype=str)

# Hot encode the 'yes' and 'no' categories of the observation
# (i.e. seeing or not an umbrella)
x = np.array([[1, 0] if i == 'yes' else [0, 1] for i in data[:, 1]])

# Fit the HMM from the data expecting 3 hidden states (the weather on the day:
# sunny, rainy or foggy)
model = hmm.GaussianHMM(n_components=3, n_iter=100, verbose=True)
model.fit(x, [len(x)])

# Test the model
test = ['no', 'no', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'yes']
x_test = np.array([[1, 0] if i == 'yes' else [0, 1] for i in test])
y_test = ['foggy', 'foggy', 'foggy', 'rainy', 'sunny', 'foggy', 'rainy', 'rainy', 'foggy', 'rainy']

y_pred = model.predict(x_test)

mp = {0: 'sunny', 1: 'rainy', 2: 'foggy'} # THIS IS MY ASSUMPTION

print('\n\n\n')

print('Expected:')
print(y_test)
print('Predicted:')
print([mp[i] for i in y_pred])

Result:

Expected:
['foggy', 'foggy', 'foggy', 'rainy', 'sunny', 'foggy', 'rainy', 'rainy', 'foggy', 'rainy']
Predicted:
['foggy', 'foggy', 'sunny', 'rainy', 'foggy', 'sunny', 'rainy', 'rainy', 'foggy', 'rainy']

Solution

  • My question is: how am I supposed to know the mapping of the hidden states handled by the model to the real states in the problem domain? (in other words, how do I relate the responses to the desired states of my problem domain?)

    Basically you cannot. The fact that you were able to hand craft this mapping (or even that it exists in the first place) is just a coincidence coming from extreme simplicity of the problem.

    HMM (in such learning scenario) tries to find the most probable sequence of (predefined amount of) hidden states, but like any other unsupervised learning that has no guarantee to match whatever is the task at hand. It simply models the reality the best it can, given the constraints (Markov assumption, number of hidden states, observations provided) - it does not magically detect what is the actual question one is asking (like here - sequence of weathers) but simply tries to solve its own, internal optimization problem - which is the most probable sequence of arbitrarly defined hidden states, such that under the Markov assumption (independence from old history), the observations provided are very likely to appear.

    In general you will not be able to interpret these states so easily,here the problem is so simple, that simply with the assumptions listed above - this (weather state) is pretty much the most probable thing that will be modeled. In other problems - it can capture anything that makes sense.

    As said before - this is not a HMM property, but any unsupervised learning technique - when you cluster data you just find some data partitioning, which can have some relation to what you are looking for - or have none. Similarly here - HMM will find some model of the dynamics, but it can be completely different from what you are after. If you know what you are looking for - you are supposed to use supervised learning, this is literally its definition. Unsupervised learning is to find some structure (here - dynamics), not a specific one.