pythonfontspathword-cloudchinese-locale

Why wordcloud libraries can't use stopwords to block Chinese characters in python


Today, I want to use WordCloud to create a word cloud, but the biggest word is meanless,"的",like "is" in English. I want to remove it, so I create "stopwords" to deal with it, but it still there. I have tried many methods, like "stopwords = ["的"]", "stopwords = {"的"}, or "stopwords = set(),stopword.update(["的"]), etc. But they never work. I doubt that maybe wordcloud don't support Chinese or I set the wrong font_path. Please help me, thanks so much.

this is the main code

def draw_word(words_dict):
    stopwords = {'的','是','了','说','地','得','在','与','和'}
    wc = WordCloud(
        #设定字体,否则无法支持中文输出
        font_path="msyh.ttc",
        background_color = "white",
        #最大显示单词量,默认200
        max_words=150,
        width=1500,
        height=960,
        margin = 10,
        #过滤掉高频无用助词
        stopwords = stopwords
    )
    #从词频字典中导出词云
    wc.generate_from_frequencies(words_dict)
    #绘制图片
    plt.imshow(wc)
    #显示图片
    plt.show()

And the output

enter image description here

the biggest is "的" which I want to remove.


Solution

  • The documentation says

    stopwords: set of strings or None The words that will be eliminated. If None, the build-in STOPWORDS list will be used. Ignored if using generate_from_frequencies.