pythonword-cloud

Python: word count from WordCloud


I am using WordCloud on a body of text, and I would like to see the actual counts for each word in the cloud. I can see the weighted frequencies using .words_ but I was wondering if there is an easy way to see the actual counts?

# Generate a word cloud image
wordcloud = WordCloud(background_color="white").generate(text)
wordfreq = wordcloud.words_

Edit: the reason I would like to be able to see the word counts from the WordCloud (versus just finding word counts from the text myself) is because WordCloud includes phrases (collocations) as well as single words in its analysis. So, for example, a count of "water resources" would appear, as well as a count of the word "water" when it does not appear in "water resources." WordCloud also appears to add instances of words that appear in plural form to the count of the word as a singular (e.g. counting "water resources" in the count of "water resource").


Solution

  • Just use WordCloud().process_text(text):

    >>> WordCloud().process_text('penn penn penn penn penn state state state state uni uni uni college college university states vice president vice president vice president vice president vice president vice president vice president')
    {'penn': 5, 'state': 5, 'uni': 3, 'college': 2, 'university': 1, 'vice president': 7}
    

    Notice that it combines "states" into the "state" count and also counts "vice president" as a bigram.