pythondata-structuressetcpython

How is set() implemented?


I've seen people say that set objects in python have O(1) membership-checking. How are they implemented internally to allow this? What sort of data structure does it use? What other implications does that implementation have?

Every answer here was really enlightening, but I can only accept one, so I'll go with the closest answer to my original question. Thanks all for the info!


Solution

  • According to this thread:

    Indeed, CPython's sets are implemented as something like dictionaries with dummy values (the keys being the members of the set), with some optimization(s) that exploit this lack of values

    So basically a set uses a hashtable as its underlying data structure. This explains the O(1) membership checking, since looking up an item in a hashtable is an O(1) operation, on average.

    If you are so inclined you can even browse the CPython source code for set which, according to Achim Domma, was originally mostly a cut-and-paste from the dict implementation.

    Note: Nowadays, set and dict's implementations have diverged significantly, so the precise behaviors (e.g. arbitrary order vs. insertion order) and performance in various use cases differs; they're still implemented in terms of hashtables, so average case lookup and insertion remains O(1), but set is no longer just "dict, but with dummy/omitted values".