I am working on topic inference that will require to load a previously saved model.
However, I got a pickle error that says
Traceback (most recent call last):
File "topic_inference.py", line 35, in <module>
model_for_inference = gensim.models.LdaModel.load(model_name, mmap = 'r')
File "topic_modeling/env/lib/python3.8/site-packages/gensim/models/ldamodel.py", line 1663, in load
result = super(LdaModel, cls).load(fname, *args, **kwargs)
File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 486, in load
obj = unpickle(fname)
File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 1461, in unpickle
return _pickle.load(f, encoding='latin1') # needed because loading from S3 doesn't support readline()
TypeError: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given
The code I use to load the model is simply
gensim.models.LdaModel.load(model_name, mmap = 'r')
Here is the code that I use to create and save the model
model = gensim.models.ldamulticore.LdaMulticore(
corpus=comment_corpus,
id2word=key_word_dict, ## This is now a gensim.corpora.Dictionary Object, previously it was the .id2token attribute
chunksize=chunksize,
alpha='symmetric',
eta='auto',
iterations=iterations,
num_topics=num_topics,
passes=epochs,
eval_every=eval_every,
workers = 15,
minimum_probability= 0.0)
model.save(output_model)
where output_model
doesn't have an extension like .model
or .pkl
In the past, I tried the similar approach with the exception that I passed in a .id2token
attribute under the gensim.corpora.Dictionary
object instead of the full gensim.corpora.Dictionary
to the id2word
parameter when I created the model, and the method loads the model fine back then. I wonder if passing in a corpora.Dictionary
will make a difference in the loading output...? Back that time, I was using regular python, but now I am using anaconda. However, all the versions of the packages are the same.
Another report of an error about __randomstate_ctor
(at https://github.com/numpy/numpy/issues/14210) suggests the problem may be related to numpy object pickling.
Is there a chance that the configuration where your load is failing is using a later version of numpy
than when the save occurred? Could you try, at least temporarily, rolling back to some older numpy
(that's still sufficient for whatever Gensim you're using) to see if it helps?
If you find any load that works, even in a suboptimal config, you might be able to null-out whatever random
-related objects are causing the problem and re-save, then having a saved version that loads better in your truly-desired configuration. Then, if the random
-related objects truly needed after reload, it may be possible to manually re-constitute them. (I haven't looked into this yet, but if you find any workaround allowing a load, but then aren't sure what to manually null/rebuild, I could take a closer look.)