hidden-markov-modelshmmlearnmarkov-models

How to update the hmmlearn learned object when we have new samples?


I have implemented a simple code for Hidden Markov Model by hmmlearn and it is working well. I used fit() method, i.e. hmmlearn.fit to learn the hmm parameter based on my data. If I have more data and want to update previously fitted model without training/fitting from scratch, what can I do? In other words, how can I initialize a new model based on what I know so far, and keep going with the new piece of observations/samples to fit a better model to my data


Solution

  • In hmmlearn you may have noticed that once you train with hmmlearn.fit, the model parameters update:

    import numpy as np
    import pickle
    from hmmlearn import hmm
    
    np.random.seed(42)
    
    # initialize model
    model = hmm.GaussianHMM(n_components=3, covariance_type="full")
    model.startprob_ = np.array([0.33, 0.33, 0.34])
    model.transmat_ = np.array([[0.1, 0.2, 0.7],
                                [0.3, 0.5, 0.2],
                                [0.5, 0.1, 0.4]])
    model.means_ = np.array([[1.0, 1.0], [2.0, 1.0], [3.0, 1.0]])
    model.covars_ = np.tile(np.identity(2), (3, 1, 1))
    
    # generate "fake" training data
    emissions1, states1 = model.sample(100)
    print("Transition matrix before training: \n", model.transmat_)
    
    # train
    model.fit(emissions1)
    print("Transition matrix after training: \n", model.transmat_)
    
    # save model
    with open("modelname.pkl", "wb") as f: pickle.dump(model, f)
    
    #################################
    
    >>> Transition matrix before training: 
     [[0.1 0.2 0.7]
     [0.3 0.5 0.2]
     [0.5 0.1 0.4]]
    >>> Transition matrix after training: 
     [[0.19065325 0.50905216 0.30029459]
     [0.41888047 0.39276483 0.18835471]
     [0.44558543 0.13767827 0.4167363 ]]
    

    This means that if you have a new training data (ie. emissions2), you can use the same updated model to train on the new emission sequence. You can either choose to save the entire model by pickling (as shown above), or you can save the numpy arrays of the transition matrix, emission matrix, etc.