python-3.xnltknltk-book

Dictionary not sorting correctly in python


My code should output the top 10 words with the highest frequency in the corpus. However, it is giving the output of 10 random words.

from nltk.corpus import brown
import operator

brown_tagged_sentences = brown.tagged_sents(categories='news')
fd=nltk.FreqDist(brown.words(categories='news'))
sorted_fd = dict(sorted(fd.items(), key=operator.itemgetter(1), reverse=True))
print(sorted_fd)
most_freq_words=list(sorted_fd)[:10]
for word in most_freq_words:
    print(word,':',sorted_fd[word])

The output coming currently is below which is wrong:

Rae : 1
discharge : 1
ignition : 1
contendere : 1
done : 24
meaning : 4
ashore : 1
Francesca : 1
Vietnamese : 1
data : 4

Kindly help


Solution

  • The nltk's FreqDist() class can directly give you its contents in descending order of frequency, using the method most_common():

    fd=nltk.FreqDist(brown.words(categories='news'))
    for w, f in fd.most_common(10):
        print(w+":", f)