I am trying to count word frequencies in a text and then sort them in descending order of frequency. So if two words have the same frequency, they should remain in the same order they first appeared in the original text (not alphabetical order). See the example below to understand:
Example input:
Hi I live in America I love cooking
After converting to lowercase, the expected output is:
{"i": 2, "hi": 1, "live": 1, "in": 1, "america": 1, "love": 1, "cooking": 1}
Current Approach:
words = input().lower().split()
d = {i: words.count(i) for i in words}
dk = list(d.keys())
dv = sorted(d.values())[::-1]
sd = sorted(d.items())
nd = dict()
for i in dv:
for j in dk:
if d[j] == i:
nd[j] = i
print(nd)
This works, but calling words.count() for every element makes the time complexity very poor, and the nested loops make it worse. I’m unable to figure out a better algorithm to replace this.
collections.Counter does what you want:
>>> from collections import Counter
>>> s = 'Hi I live in America I love cooking'
>>> Counter(s.lower().split())
Counter({'i': 2, 'hi': 1, 'live': 1, 'in': 1, 'america': 1, 'love': 1, 'cooking': 1})
Changed in version 3.7: As a
dictsubclass,Counterinherited the capability to remember insertion order. Math operations onCounterobjects also preserve order. Results are ordered according to when an element is first encountered in the left operand and then by the order encountered in the right operand.
More generically, a dict preserves order; if you're using older Python versions or want to make this even more explicit, use an OrderedDict. Then the algorithm is simply to insert items as encountered and do something with them; in your case, count up:
count = {}
for word in s.lower().split():
count[word] = count.get(word, 0) + 1
A defaultdict also helps here:
count = defaultdict(int)
for word in s.lower().split():
count[word] += 1
Then you can just sort the result, because sorts are stable and will preserve the order:
>>> from operator import itemgetter
>>> sorted(count.items(), key=itemgetter(1), reverse=True)
[('i', 2), ('hi', 1), ('live', 1), ('in', 1), ('america', 1), ('love', 1), ('cooking', 1)]
>>> dict(sorted(count.items(), key=itemgetter(1), reverse=True))
{'i': 2, 'hi': 1, 'live': 1, 'in': 1, 'america': 1, 'love': 1, 'cooking': 1}
The built-in
sorted()function is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal — this is helpful for sorting in multiple passes (for example, sort by department, then by salary grade).