According to the Python documentation, set
is a mutable unordered collection.
Usually, it's implemented as a hash table that stores references to objects as its keys. Comparing to dict
(which is also usually a hash table), a dictionary's order of elements is guaranteed to be insertion order since Python 3.7. For set
, it is not.
There is beautiful in-depth explanation about Python dicts and sets implementation behaviour in Why is the order in dictionaries and sets arbitrary?
But it remains unclear: can it be that a set
will be internally rehashed or reallocated by Python interpreter during the execution of some code that doesn't modify that set explicitly? Or is it guaranteed to be stable within this run of a program? Does it depend on set size?
s = {1, 4, 3, 5}
iter_result = [i for i in s]
# some other code not modifying set
assert [i for i in s] == iter_result # always true within this run???
can it be that a set will be internally rehashed or reallocated by Python interpreter during the execution of some code that doesn't modify that set explicitly?
No.
Hashable
from python glossary:
An object is hashable if it has a hash value which never changes during its lifetime
Set
docs
The elements of a set must be hashable.
Sets will not be rehashed during their lifetime. Lists preserve the insertion order so the order of a list built from a set is the order given from the set and that will not change since hashes don't change.
As mentioned, hashes are randomized across executions as noted by this Bash one-liner that builds:
for i in 1 2 3 4; do python3.12 -c 'import string;l=[f"ba{c}" for c in string.ascii_lowercase if hash(f"ba{c}") % 8 == 2 ];print(f"{l}\n{set(l)}\n{list(set(l))}\n")';done
Result
['baa', 'bah', 'bai', 'bam', 'bap']
{'baa', 'bap', 'bah', 'bai', 'bam'}
['baa', 'bap', 'bah', 'bai', 'bam']
['bag', 'bai', 'bal', 'bax']
{'bai', 'bag', 'bal', 'bax'}
['bai', 'bag', 'bal', 'bax']
['bae', 'bai', 'baq', 'bax']
{'baq', 'bae', 'bax', 'bai'}
['baq', 'bae', 'bax', 'bai']
['bae', 'bap', 'bax']
{'bae', 'bap', 'bax'}
['bae', 'bap', 'bax']
But, we can turn off hash randomization by setting PYTHONHASHSEED to a constant value
for i in 1 2 3 4; do PYTHONHASHSEED=1 python3.12 -c 'import string;l=[f"ba{c}" for c in string.ascii_lowercase if hash(f"ba{c}") % 8 == 2 ];print(f"{l}\n{set(l)}\n{list(set(l))}\n")';done
Result
['bac', 'bah', 'bak', 'baw']
{'bac', 'bak', 'bah', 'baw'}
['bac', 'bak', 'bah', 'baw']
['bac', 'bah', 'bak', 'baw']
{'bac', 'bak', 'bah', 'baw'}
['bac', 'bak', 'bah', 'baw']
['bac', 'bah', 'bak', 'baw']
{'bac', 'bak', 'bah', 'baw'}
['bac', 'bak', 'bah', 'baw']
['bac', 'bah', 'bak', 'baw']
{'bac', 'bak', 'bah', 'baw'}
['bac', 'bak', 'bah', 'baw']