Let's say we're solving a simple word count problem. There's a list and we're trying to find the frequency of each word occurring in the list, using a dictionary to keep the counts.
Which pattern is faster here?
book_title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
word_count_dict = {}
Pattern 1
for word in book_title:
if word in word_count_dict:
word_count_dict[word] += 1
else:
word_count_dict[word] = 1
Pattern 2
for word in book_title:
if word not in word_count_dict:
word_count_dict[word] = 1
else:
word_count_dict[word] += 1
Six of one, half a dozen of the other. They should be roughly equivalent to each other - in computational terms, a not
operation is nearly negligible (literally the cheapest possible operation), and the in
operation in a hashtable like a dictionary runs in constant time (either the hash is there, or it's not). If we were dealing with a list, it would run in linear time, but still the between in
and not in
. See also computational complexity of python data structures.
So basically, use whichever makes your code more understandable.
That said, have you considered using a collections.Counter
, a data structure specifically designed for this purpose?
import collections
book_title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
word_counts = collections.Counter(book_title)
print(word_counts)
# Counter({'great': 2, 'the': 2, 'adventures': 2, 'of': 2, 'expectations': 1, 'sherlock': 1, 'holmes': 1, 'gasby': 1, 'hamlet': 1, 'huckleberry': 1, 'fin': 1})
You can typecast a collections.Counter
to a dict
if you need to, and in fact collections.Counter
is a subclass of dict
. It even has an .update()
method specifically designed to work with other counters - if you add another book title, just feed it into a Counter
and then .update()
the original with that.