To find the difference between two sets there is the -
operator and .difference()
. I'm using this code to time each of those:
import timeit
print(timeit.timeit('''
a.difference({b})
''', setup='''
a = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
b = 3
'''))
# => 0.2423834060318768
print(timeit.timeit('''
a - {b}
''', setup='''
a = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
b = 3
'''))
# => 0.2027170000365004
When I run this in CPython I get this:
0.24530324200168252
0.205820870003663
This made sense to me because .difference()
can take any iterable, not just sets. However, when I run it in PyPy I get this:
0.14613953093066812
0.23659668595064431
The times are completely flipped, so surely it can't be because .difference()
can take any interable. What is the difference between the implementations of .difference()
and -
? Is there any difference between CPython and PyPy's implementations?
In PyPy, there is an optimization that is invoked by difference()
but not by __sub__()
: when you use a.difference(b)
and b
is a smaller set than a
, then it copies a
completely and removes the items from b
. If it's not the case, it starts from an empty set and adds the items of a
that are not in b
. For some reason __sub__()
doesn't go through the path that selects between these two implementations, and always picks the second logic.
Please report it on https://foss.heptapod.net/pypy/pypy/-/issues and it will likely be fixed.