I've built a topic model, with:
To find the optimal number of topics, I want to calculate the coherence for a model. However, I am only aware of Gensim
's Coherencemodel
, which seems to require a Gensim model as input.
Are there any other packages/implementations that I could use to calculate the coherence of a computed topic model? Or, if it is indeed possible to use the Coherencemodel
without inputting a LDAmodel, could someone show me how to do that?
Actually, you can do this with the Gensim package.
input_data = list of list with tokenized texts
topics = list with top N words per topic
import gensim.corpora as corpora
from gensim.models.coherencemodel import CoherenceModel
id2word = corpora.Dictionary(input_data)
corpus = [id2word.doc2bow(text) for text in input_data]
cm = CoherenceModel(
topics=topics,
texts=input_data,
corpus=corpus,
dictionary=id2word,
coherence='c_v')
coherence = cm.get_coherence()