Why does setdefault not increment by 1 for every occurrence in a
inside a dictionary comprehension, but it does in a loop? What's going on here?
Alternative solutions are great. I'm mostly interested in understanding why this doesn't work.
a = [1,1,2,2,2,3,3]
b = {}
for x in a:
b[x] = b.setdefault(x, 0) + 1
b
Out[4]: {1: 2, 2: 3, 3: 2}
b = {k: b.setdefault(k, 0) + 1 for k in a}
b
Out[7]: {1: 1, 2: 1, 3: 1}
Thanks for the answers, I wanted to try timing the solutions.
def using_get(a):
b = {}
for x in a:
b[x] = b.get(x, 0) + 1
return b
def using_setdefault(a):
b = {}
for x in a:
b[x] = b.setdefault(x, 0) + 1
return b
timeit.timeit(lambda: Counter(a), number=1000000)
Out[3]: 15.19974103783569
timeit.timeit(lambda: using_get(a), number=1000000)
Out[4]: 3.1597984457950474
timeit.timeit(lambda: using_setdefault(a), number=1000000)
Out[5]: 3.231248461129759
There is no dictionary yet inside the dict comprehension. You are building a completely new dictionary, replacing whatever b
was bound to before.
In other words, in your dictionary comprehension, b.setdefault()
is a totally different dictionary, it has nothing to do with the object being built by the comprehension.
In fact, your dictionary comprehension only works if b
was bound to an object with a .setdefault()
method before you run the expression. If b
is not yet defined, or not bound to an object with such a method, it simply fails with an exception:
>>> a = [1,1,2,2,2,3,3]
>>> b = {k: b.setdefault(k, 0) + 1 for k in a}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <dictcomp>
NameError: global name 'b' is not defined
>>> b = 42
>>> b = {k: b.setdefault(k, 0) + 1 for k in a}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <dictcomp>
AttributeError: 'int' object has no attribute 'setdefault'
You cannot do what you want with a dictionary comprehension, unless you group your numbers, which requires sorting and itertools.groupby()
; this is not an efficient approach (requiring O(NlogN) steps rather than O(N)):
>>> from itertools import groupby
>>> {k: sum(1 for _ in group) for k, group in groupby(sorted(a))}
{1: 2, 2: 3, 3: 2}
Note that the standard library already comes with a tool to do counting; see the collections.Counter()
object:
>>> from collections import Counter
>>> Counter(a)
Counter({2: 3, 1: 2, 3: 2})