I have implemented a simple code for Hidden Markov Model by hmmlearn and it is working well. I used fit() method, i.e. hmmlearn.fit to learn the hmm parameter based on my data. If I have more data and want to update previously fitted model without training/fitting from scratch, what can I do? In other words, how can I initialize a new model based on what I know so far, and keep going with the new piece of observations/samples to fit a better model to my data
In hmmlearn
you may have noticed that once you train with hmmlearn.fit
, the model parameters update:
import numpy as np
import pickle
from hmmlearn import hmm
np.random.seed(42)
# initialize model
model = hmm.GaussianHMM(n_components=3, covariance_type="full")
model.startprob_ = np.array([0.33, 0.33, 0.34])
model.transmat_ = np.array([[0.1, 0.2, 0.7],
[0.3, 0.5, 0.2],
[0.5, 0.1, 0.4]])
model.means_ = np.array([[1.0, 1.0], [2.0, 1.0], [3.0, 1.0]])
model.covars_ = np.tile(np.identity(2), (3, 1, 1))
# generate "fake" training data
emissions1, states1 = model.sample(100)
print("Transition matrix before training: \n", model.transmat_)
# train
model.fit(emissions1)
print("Transition matrix after training: \n", model.transmat_)
# save model
with open("modelname.pkl", "wb") as f: pickle.dump(model, f)
#################################
>>> Transition matrix before training:
[[0.1 0.2 0.7]
[0.3 0.5 0.2]
[0.5 0.1 0.4]]
>>> Transition matrix after training:
[[0.19065325 0.50905216 0.30029459]
[0.41888047 0.39276483 0.18835471]
[0.44558543 0.13767827 0.4167363 ]]
This means that if you have a new training data (ie. emissions2
), you can use the same updated model to train on the new emission sequence. You can either choose to save the entire model by pickling (as shown above), or you can save the numpy arrays of the transition matrix, emission matrix, etc.