nlpgensimldatopic-modelingpyldavis

How to get list of words for each topic for a specific relevance metric value (lambda) in pyLDAvis?


I am using pyLDAvis along with gensim.models.LdaMulticore for topic modeling. I have totally 10 topics. When I visualize the results using pyLDAvis, there is a bar called lambda with this explanation: "Slide to adjust relevance metric". I am interested to extract the list of words for each topic separately for lambda = 0.1. I cannot find a way to adjust lambda in the document for extracting keywords.

I am using these lines:

if 1 == 1:
    LDAvis_prepared = pyLDAvis.gensim_models.prepare(lda_model, corpus, id2word, lambda_step=0.1)
LDAvis_prepared.topic_info

And these are the results:

   Term     Freq        Total       Category logprob loglift
321 ra      2336.000000 2336.000000 Default 30.0000 30.0000
146 may     1741.000000 1741.000000 Default 29.0000 29.0000
66  doctor  1310.000000 1310.000000 Default 28.0000 28.0000

First of all these results are not related to what I observe with lambda of 0.1 in visualization. Secondly I cannot see the results separated by the topics.


Solution

  • You may want to read this github page: https://nicharuc.github.io/topic_modeling/

    According to this example, your code could go like this:

    lambd = 0.6 # a specific relevance metric value
    
    all_topics = {}
    num_topics = lda_model.num_topics 
    num_terms = 10 
    
    for i in range(1,num_topics+1): ## Indecies are 1-based, not 0-based
        topic = LDAvis_prepared.topic_info[LDAvis_prepared.topic_info.Category == 'Topic'+str(i)].copy()
        topic['relevance'] = topic['loglift']*(1-lambd)+topic['logprob']*lambd
        all_topics['Topic '+str(i)] = topic.sort_values(by='relevance', ascending=False).Term[:num_terms].values
    pd.DataFrame(all_topics).T