pythonldaword-cloud

How to generate word clouds from LDA models in Python?


I am doing some topic modeling on newspaper articles, and have implemented LDA using gensim in Python3. Now I want to create a word cloud for each topic, using the top 20 words for each topic. I know I can print the words, and save the LDA model, but is there any way to just save the top words for each topic which I can further use for generating word clouds?

I tried to google it, but could not find anything relevant. Any help is appreciated.


Solution

  • You can get the topn words from an LDA model using Gensim's built-in method show_topic.

    lda = models.LdaModel.load('lda.model')
    
    for i in range(0, lda.num_topics):
        with open('output_file.txt', 'w') as outfile:
            outfile.write('{}\n'.format('Topic #' + str(i + 1) + ': '))
            for word, prob in lda.show_topic(i, topn=20):
                outfile.write('{}\n'.format(word.encode('utf-8')))
            outfile.write('\n')
    

    This will write a file with a format similar to this:

    Topic #69: 
    pet
    dental
    tooth
    adopt
    animal
    puppy
    rescue
    dentist
    adoption
    animal
    shelter
    pet
    dentistry
    vet
    paw
    pup
    patient
    mix
    foster
    owner
    
    Topic #70: 
    periscope
    disneyland
    disney
    snapchat
    brandon
    britney
    periscope
    periscope
    replay
    britneyspear
    buffaloexchange
    britneyspear
    https
    meerkat
    blab
    periscope
    kxci
    toni
    disneyland
    location
    

    You may or may not need to adjust this to your needs, ie yield a list of top 20 words instead of outputting it to a text file.

    The answer in this post gives a good explanation of how to use raw text to create the word clouds. How do I print lda topic model and the word cloud of each of the topics