pandasmatplotlibnlpnltkword-cloud

Generate a word cloud to show frequenices of numbers in Python


I have a pandas dataframe which consists of grade points of students. I want to generate the word cloud or number cloud for the grades. Is there any way to achieve it. I tried all possible ways but all my efforts in vain. Basically what I want is word cloud that contains numbers in it. from the column CGPA.

Here is what I tried :

import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
df = pd.read_csv("VTU_marks.csv")
# rounding off
df = df[df['CGPA'].isnull() == False]
df['CGPA'] = df['CGPA'].round(decimals=2)

wordcloud = WordCloud(max_font_size=50,max_words=100,background_color="white").generate(string)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

But I am getting an error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-47-29ec36ebbb1e> in <module>()
----> 1 wordcloud = WordCloud(max_font_size=50, max_words=100, background_color="white").generate(string)
      2 plt.figure()
      3 plt.imshow(wordcloud, interpolation="bilinear")
      4 plt.axis("off")
      5 plt.show()

/usr/local/lib/python3.6/dist-packages/wordcloud/wordcloud.py in generate(self, text)
    603         self
    604         """
--> 605         return self.generate_from_text(text)
    606 
    607     def _check_generated(self):

/usr/local/lib/python3.6/dist-packages/wordcloud/wordcloud.py in generate_from_text(self, text)
    585         """
    586         words = self.process_text(text)
--> 587         self.generate_from_frequencies(words)
    588         return self
    589 

/usr/local/lib/python3.6/dist-packages/wordcloud/wordcloud.py in generate_from_frequencies(self, frequencies, max_font_size)
    381         if len(frequencies) <= 0:
    382             raise ValueError("We need at least 1 word to plot a word cloud, "
--> 383                              "got %d." % len(frequencies))
    384         frequencies = frequencies[:self.max_words]
    385 

ValueError: We need at least 1 word to plot a word cloud, got 0.

You can find the data here.


Solution

  • After setting up your data and rounding as desired we can count up the frequency of each score:

    counts = df['CGPA'].value_counts()
    

    We need to make sure that the indices here are strings, floats will raise an error (this is what was wrong in your example attempt). So, we can convert them to strings as:

    counts.index = counts.index.map(str)
    
    #Below alternative works for pandas versions >= 0.19.0
    #counts.index = counts.index.astype(str)
    

    We can then use the .generate_from_frequencies method to get what you desire:

    wordcloud = WordCloud().generate_from_frequencies(counts)
    plt.figure()
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.show()
    

    This gave me the following:

    enter image description here

    Full MWE:

    import pandas as pd
    from wordcloud import WordCloud
    import matplotlib.pyplot as plt
    
    df = pd.read_csv("VTU_marks.csv")
    # rounding off
    df = df[df['CGPA'].isnull() == False]
    df['CGPA'] = df['CGPA'].round(decimals=2)
    
    counts = df['CGPA'].value_counts()
    
    counts.index = counts.index.map(str)
    #counts.index = counts.index.astype(str)
    
    wordcloud = WordCloud().generate_from_frequencies(counts)
    plt.figure()
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.show()