nlpgensimtopic-modeling

Gensim: Not able to load the id2word file


I am working on topic inference on a new corpus given a previously derived lda model. I am able to load the model perfectly, while I am not able to load the id2word file to create the corpora.Dictionary object needed to map the new corpus into numbers: the load method returns a dict attribute error that I don't know why. Below is the minimal code that replicates the situation, and I have attached the code (and packages used) here.

Thank you in advance for your response...

import numpy as np
import os
import pandas as pd
import gensim
from gensim import corpora
import datetime
import nltk

model_name = "lda_sub_full_35"

dictionary_name = "lda_sub_full_35.id2word"

model_for_inference = gensim.models.LdaModel.load(model_name, mmap='r')
print('Successfully load the model')
lda_dictionary = corpora.Dictionary.load(dictionary_name, mmap='r')

I expect to have both the dictionary and the model loaded, but it turns out that when I load the dictionary, I got the below error:

File "topic_inference.py", line 31, in <module>
    lda_dictionary = corpora.Dictionary.load(dictionary_name, mmap='r')
File "/topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 487, in load
    obj._load_specials(fname, mmap, compress, subname)
AttributeError: 'dict' object has no attribute '_load_specials'```

Solution

  • How were the contents of the lda_sub_full_35.id2word file originally saved?

    Only if it was saved by a Gensim corpora.Dictionary object's .save() method should it be loaded as you've tried, with corpora.Dictionary.load().

    If, by any chance, it was just a plain Python dict saved via some other method of writing a pickle()-created object, then you would need to load it in a symmetrically-matched way. That might be as simple as:

    import pickle
    
    with open(path, 'rb') as f:
        lda_dictionary = pickle.load(f)