picklegensimlda

Gensim Pickle Error: Enable to Load the Saved Topic Model


I am working on topic inference that will require to load a previously saved model.

However, I got a pickle error that says

Traceback (most recent call last):
  File "topic_inference.py", line 35, in <module>
    model_for_inference = gensim.models.LdaModel.load(model_name, mmap = 'r')
  File "topic_modeling/env/lib/python3.8/site-packages/gensim/models/ldamodel.py", line 1663, in load
    result = super(LdaModel, cls).load(fname, *args, **kwargs)
  File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 486, in load
    obj = unpickle(fname)
  File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 1461, in unpickle
    return _pickle.load(f, encoding='latin1')  # needed because loading from S3 doesn't support readline()
TypeError: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given

The code I use to load the model is simply

gensim.models.LdaModel.load(model_name, mmap = 'r')

Here is the code that I use to create and save the model

 model = gensim.models.ldamulticore.LdaMulticore(
        corpus=comment_corpus,
        id2word=key_word_dict, ## This is now a gensim.corpora.Dictionary Object, previously it was the .id2token attribute
        chunksize=chunksize,
        alpha='symmetric',
        eta='auto',
        iterations=iterations,
        num_topics=num_topics,
        passes=epochs,
        eval_every=eval_every, 
        workers = 15,
        minimum_probability= 0.0)

model.save(output_model)

where output_model doesn't have an extension like .model or .pkl

In the past, I tried the similar approach with the exception that I passed in a .id2token attribute under the gensim.corpora.Dictionary object instead of the full gensim.corpora.Dictionary to the id2word parameter when I created the model, and the method loads the model fine back then. I wonder if passing in a corpora.Dictionary will make a difference in the loading output...? Back that time, I was using regular python, but now I am using anaconda. However, all the versions of the packages are the same.


Solution

  • Another report of an error about __randomstate_ctor (at https://github.com/numpy/numpy/issues/14210) suggests the problem may be related to numpy object pickling.

    Is there a chance that the configuration where your load is failing is using a later version of numpy than when the save occurred? Could you try, at least temporarily, rolling back to some older numpy (that's still sufficient for whatever Gensim you're using) to see if it helps?

    If you find any load that works, even in a suboptimal config, you might be able to null-out whatever random-related objects are causing the problem and re-save, then having a saved version that loads better in your truly-desired configuration. Then, if the random-related objects truly needed after reload, it may be possible to manually re-constitute them. (I haven't looked into this yet, but if you find any workaround allowing a load, but then aren't sure what to manually null/rebuild, I could take a closer look.)