pythongensimldatopic-modelingmallet

PyLDAvis visualisation does not align with generated topics


I am using PyLDAvis to visualise the results of the LDA from Mallet.

Before I can do that, I need the wrapper of the gensim library:

model = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(model_list[8])

When I print the found topics, they are ordered from 0-10.

However when I am using the pyLDAvis to visualise the Topics, the Topic order (0-10), does not align with printed topics.

Example:

(5,
  '0.042*"euro" + 0.030*"smartpho" + 0.022*"camera" + 0.020*"display" + '
  '0.018*"model" + 0.016*"picture" + 0.012*"price" + 0.010*"android"')

As you can see this topic is about smartphones.

However when I visualise the model with pyLDAvis, Topic 5 is not about smartphones, but about another Topic (cars for example). The smartphone topic is not 5 anymore but topic 1.

Example1:

enter image description here

Example2: enter image description here

Is this a known error or is this the normal? Somebody can help?


Solution

  • By default, pyLDAvis sorts the topics by topic proportion -- To keep the original sort order, pass sort_topics=False to pyLDAvis.prepare(). Note that the pyLDAvis topics will still be off by one (i.e., Topic 1 in pyLDAvis will be Topic 0 from gensim).

    There is a similar question here: Is there any way to match Gensim LDA output with topics in pyLDAvis graph?

    And an associated issue on the pyLDAvis repo: https://github.com/bmabey/pyLDAvis/issues/127