pythonpython-2.7python-3.xdictionarydictview

dictionary lookup on Python 2.7 vs 3.4


This came up in a nitpick discussion about the prefered style for iterating over dictionary keys if you need to apply some test to the value. I was comparing the performance of [k for k in d if d[k] == 1] against [k for k, v in d.items() if v == 1].

Looks like dictionary lookups are more expensive in Python 3.4:

$ python -m timeit -n 1000000 \
  -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
  '[k for k in d if d[k] == 1]'
1000000 loops, best of 3: 2.17 usec per loop

$ python -m timeit -n 1000000 \
  -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
  '[k for k, v in d.items() if v == 1]'
1000000 loops, best of 3: 3.13 usec per loop

$ python3.4 -m timeit -n 1000000 \
  -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
  '[k for k in d if d[k] == 1]'
1000000 loops, best of 3: 3.25 usec per loop

$ python3.4 -m timeit -n 1000000 \
  -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
  '[k for k, v in d.items() if v == 1]'
1000000 loops, best of 3: 3.05 usec per loop

Are lookups more expensive in Python 3.4 compared to 2.7 and can you explain why?


Solution

  • It's not that lookups are more expensive1 in Python 3.4 than 2.7, but that dict.items() is cheaper. That's because dict.items() is a rather different beast in the two versions of the language:

    $ python2.7
    Python 2.7.9 (default, Dec 11 2014, 03:19:35) 
    [GCC 4.8.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> type({}.items())
    <type 'list'>
    >>> 
    
    $ python3.4
    Python 3.4.2 (default, Oct  8 2014, 08:07:42) 
    [GCC 4.8.2] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> type({}.items())
    <class 'dict_items'>
    >>> 
    

    In Python 2, dict.items() constructs and returns a list, whereas in Python 3 it returns a dictionary view, and it turns out that iterating over this dynamic view into the dictionary is faster than constructing a list and then iterating over that.

    Although dict.items() doesn't return a dictionary view in 2.7, it is possible to get one with dict.viewitems(), with similar performance benefits. Repeating your test, this time with viewitems() included, I get:

    $ python2.7 -m timeit -n 1000000 \
      -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
      '[k for k in d if d[k] == 1]'
    1000000 loops, best of 3: 3.38 usec per loop
    
    $ python2.7 -m timeit -n 1000000 \
      -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
      '[k for k, v in d.items() if v == 1]'
    1000000 loops, best of 3: 4.33 usec per loop
    
    $ python2.7 -m timeit -n 1000000 \
      -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
      '[k for k, v in d.viewitems() if v == 1]'
    1000000 loops, best of 3: 3.27 usec per loop
    

    As regards the discussion that prompted your investigation, I'd say that the d.items() or d.viewitems() approach is more explicit, and therefore more pythonic, but that's really more an aesthetic choice than anything else.


    1 Except inasmuch as Python 3.x is, as a general rule, slower than 2.x, but that's the price of progress ...