pythoncpythonpython-internals

How/why are {2,3,10} and {x,3,10} with x=2 ordered differently?


Sets are unordered, or rather their order is an implementation detail. I'm interested in that detail. And I saw a case that surprised me:

print({2, 3, 10})
x = 2
print({x, 3, 10})

Output (Attempt This Online!):

{3, 10, 2}
{10, 2, 3}

Despite identical elements written in identical order, they get ordered differently. How does that happen, and is that done intentionally for some reason, e.g., for optimizing lookup speed?

My sys.version and sys.implementation:

3.13.0 (main, Nov  9 2024, 10:04:25) [GCC 14.2.1 20240910]
namespace(name='cpython', cache_tag='cpython-313', version=sys.version_info(major=3, minor=13, micro=0, releaselevel='final', serial=0), hexversion=51183856, _multiarch='x86_64-linux-gnu')

Solution

  • It's a function of a couple things:

    1. Hash bucket collisions - For the smallest set size, 8 (implementation detail of CPython), 2 and 10 collide on their cutdown hash codes (which, again implementation detail, are 2 and 10; mod 8, they're both 2). Whichever one is inserted first "wins" and gets bucket index 2, the other gets moved by the probing operation. The probing operation (again, CPython implementation detail) initially checks linearly adjacent buckets for an empty bucket (because it usually finds one, and better memory locality improves cache performance), and only if it doesn't find one does it begin the randomized jumping about algorithm to find an empty bucket (it can't do pure linear probing, because that would make it far too easy to trigger pathological cases that change set operations from amortized average-case O(1) to O(n)).

    2. Compile-time optimizations: In modern CPython, sets and lists of constant literals that are at least three elements long are constructed at compile time as an immutable container (frozenset and tuple respectively). At runtime, it builds an empty set/list, then updates/extends it with the immutable container, rather than performing individual loads and adds/appends for each element. This means that when you build with s = {2, 3, 10}, you're actually doing s = set(), s.update(frozenset({2, 3, 10})) (with the frozenset pulled from cache), while s = {x, 3, 10} is building by loading x, 3 and 10 on the stack, then building the set as a single operation.

    The two of these mean that you're actually building it differently; {x, 3, 10} is inserting 2, then 3, then 10, so buckets 2 and 3 are filled, and 10 gets relocated (the probing strategy clearly puts it in bucket 0 or 1, before bucket 2). When you do {2, 3, 10}, at compile-time it's making a frozenset({3, 10, 2}), then at runtime, it's creating the empty set, then updating it by iterating that frozenset, which has already reordered the elements, so now they're no longer being added in 2, 3, 10 order, and the race for "preferred" buckets is won by different elements.

    In summary, the behavior of {x, 3, 10} is equivalent to:

    s = set()
    s.add(x)
    s.add(3)
    s.add(10)
    

    which predictably gives buckets 2 and 3 to 2 and 3 themselves, with 10 being displaced to bucket 0 or 1.

    By contrast, {2, 3, 10} builds a frozenset({3, 10, 2}) (note: it's in that order after conversion to frozenset; if you tried to run that exact line and print it, you'd see a different order), then updates an empty set with it. There is an optimized code path for populating an empty set from another set/frozenset that just copies the contents directly (rather than iterating and inserting piecemeal), so the {3, 10, 2} ordering in the cached frozenset is preserved in each set created from it, the same as as if you'd run:

    s = set()
    s.update(frozenset({2, 3, 10}))
    

    but more performant (because the frozenset is created once at compile time and loaded cheaply for each new set to initialize).