I am using gensim to create a bag of words model and I want to perform normalization. I found the documentation (https://radimrehurek.com/gensim/models/normmodel.html), but I am confused as to how to implement that given the code I have. Conversations is a list of tokenized documents, so essentially a list of lists when each element is a document.
id2word = corpora.Dictionary(conversations)
id2word.filter_extremes(keep_n=5000, keep_tokens=None)
corpus = [id2word.doc2bow(text) for text in conversations]
norm_corpus = NormModel(corpus)
Corpus is a sparse matrix, I believe. For each document, it has the non-zero frequency terms and the corresponding counts: [[(0, 2), (1, 5), (2, 4)...(92, 2), (93, 3)],...].
The last line with norm_corpus
does not work when I try to input it into the following: models.LsiModel(norm_corpus, id2word=id2word, num_topics=12)
. I get the type error message, 'int' object is not iterable. However, the documentation says to pass in a corpus so I'm confused. I would appreciate any help -- thanks!
I don't have a way to check at the moment but try this:
norm_corpus = NormModel()
norm_corpus.normalize(text)
or
norm_corpus.normalize(id2word.doc2bow(text)
In your original code you have
`NormModel(iterable)`
but the documentation says you need to pass:
NormModel(iterable of iterable(int,number))
If this makes sense.