pythonpython-3.xdictionary

Is using "not in" faster than using "in" for dictionaries?


Let's say we're solving a simple word count problem. There's a list and we're trying to find the frequency of each word occurring in the list, using a dictionary to keep the counts.

Which pattern is faster here?

book_title =  ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
word_count_dict = {}

Pattern 1

for word in book_title:
    if word in word_count_dict:
        word_count_dict[word] += 1
    else:
        word_count_dict[word] = 1

Pattern 2

for word in book_title:
    if word not in word_count_dict:
        word_count_dict[word] = 1
    else:
        word_count_dict[word] += 1

Solution

  • Six of one, half a dozen of the other. They should be roughly equivalent to each other - in computational terms, a not operation is nearly negligible (literally the cheapest possible operation), and the in operation in a hashtable like a dictionary runs in constant time (either the hash is there, or it's not). If we were dealing with a list, it would run in linear time, but still the between in and not in. See also computational complexity of python data structures.

    So basically, use whichever makes your code more understandable.


    That said, have you considered using a collections.Counter, a data structure specifically designed for this purpose?

    import collections
    book_title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
    word_counts = collections.Counter(book_title)
    print(word_counts)
    # Counter({'great': 2, 'the': 2, 'adventures': 2, 'of': 2, 'expectations': 1, 'sherlock': 1, 'holmes': 1, 'gasby': 1, 'hamlet': 1, 'huckleberry': 1, 'fin': 1})
    

    You can typecast a collections.Counter to a dict if you need to, and in fact collections.Counter is a subclass of dict. It even has an .update() method specifically designed to work with other counters - if you add another book title, just feed it into a Counter and then .update() the original with that.