pythonsetiteration

Is iteration order of a set in Python preserved until that set is modified?


According to the Python documentation, set is a mutable unordered collection.

Usually, it's implemented as a hash table that stores references to objects as its keys. Comparing to dict (which is also usually a hash table), a dictionary's order of elements is guaranteed to be insertion order since Python 3.7. For set, it is not.

There is beautiful in-depth explanation about Python dicts and sets implementation behaviour in Why is the order in dictionaries and sets arbitrary?

But it remains unclear: can it be that a set will be internally rehashed or reallocated by Python interpreter during the execution of some code that doesn't modify that set explicitly? Or is it guaranteed to be stable within this run of a program? Does it depend on set size?

s = {1, 4, 3, 5}
iter_result = [i for i in s]

# some other code not modifying set

assert [i for i in s] == iter_result  # always true within this run???

Solution

  • can it be that a set will be internally rehashed or reallocated by Python interpreter during the execution of some code that doesn't modify that set explicitly?

    No.

    Hashable from python glossary:

    An object is hashable if it has a hash value which never changes during its lifetime

    Set docs

    The elements of a set must be hashable.

    Sets will not be rehashed during their lifetime. Lists preserve the insertion order so the order of a list built from a set is the order given from the set and that will not change since hashes don't change.

    As mentioned, hashes are randomized across executions as noted by this Bash one-liner that builds:

    for i in 1 2 3 4; do python3.12 -c 'import string;l=[f"ba{c}" for c in string.ascii_lowercase if hash(f"ba{c}") % 8 == 2 ];print(f"{l}\n{set(l)}\n{list(set(l))}\n")';done
    
    

    Result

    ['baa', 'bah', 'bai', 'bam', 'bap']
    {'baa', 'bap', 'bah', 'bai', 'bam'}
    ['baa', 'bap', 'bah', 'bai', 'bam']
    
    ['bag', 'bai', 'bal', 'bax']
    {'bai', 'bag', 'bal', 'bax'}
    ['bai', 'bag', 'bal', 'bax']
    
    ['bae', 'bai', 'baq', 'bax']
    {'baq', 'bae', 'bax', 'bai'}
    ['baq', 'bae', 'bax', 'bai']
    
    ['bae', 'bap', 'bax']
    {'bae', 'bap', 'bax'}
    ['bae', 'bap', 'bax']
    

    But, we can turn off hash randomization by setting PYTHONHASHSEED to a constant value

    for i in 1 2 3 4; do PYTHONHASHSEED=1 python3.12 -c 'import string;l=[f"ba{c}" for c in string.ascii_lowercase if hash(f"ba{c}") % 8 == 2 ];print(f"{l}\n{set(l)}\n{list(set(l))}\n")';done
    

    Result

    ['bac', 'bah', 'bak', 'baw']
    {'bac', 'bak', 'bah', 'baw'}
    ['bac', 'bak', 'bah', 'baw']
    
    ['bac', 'bah', 'bak', 'baw']
    {'bac', 'bak', 'bah', 'baw'}
    ['bac', 'bak', 'bah', 'baw']
    
    ['bac', 'bah', 'bak', 'baw']
    {'bac', 'bak', 'bah', 'baw'}
    ['bac', 'bak', 'bah', 'baw']
    
    ['bac', 'bah', 'bak', 'baw']
    {'bac', 'bak', 'bah', 'baw'}
    ['bac', 'bak', 'bah', 'baw']