python-3.xgensimevaluationtopic-modeling

Negative Values: Evaluate Gensim LDA with Topic Coherence


I´m currently trying to evaluate my topic models with gensim topiccoherencemodel:

from gensim.models.coherencemodel import CoherenceModel
cm_u_mass = CoherenceModel(model = model1, corpus = corpus1, coherence = 'u_mass')
coherence_u_mass = cm_u_mass.get_coherence()

print('\nCoherence Score: ', coherence_u_mass)

The output is just negative values. Is this correct? Can anybody provide a formula or something how u_mass works?


Solution

  • Having a quick look at the original article you can see that UMass coherence is calculated over the log of probabilities therefore it is negative.

    About the formula you asked, it can be found as equation 4 in the same article.

    I understand that as the value of UMass coherence approaches to 0 the topic coherence gets better.

    Hope this helps.