c++stlunordered-mapunordered-set

How are unordered_set and unordered_map implemented in the C++ STL?


I want to know, how are unordered_set and unordered_map implemented in the C++ STL?

To be more precise:

  1. Are individual elements a part of the node which together are connected as a list or an array of consecutive elements?

  2. Are all buckets allocated side-by-side in memory? If not, what is the logic towards iterating over multiple buckets?

  3. How are collisions handled in unordered_set and unordered_map? Or, do they not and let the user define the load factor, and based on that they create a completely new set of buckets?


Solution

  • Based on its interface, the expectation for unordered_(set|map) was something at least reasonably close to a hash table with direct chaining.

    That is, you'd start with an array of pointers. Each of those pointers is to a bucket, which is typically going to be a linked list of nodes.

    We can deduce most of this from the complexity requirements. In particular, the fact that it degenerates from O(1) to O(N) as the table gets overfilled indicates that under normal circumstances (load factor < 1) the complexity is based on hashing the key to find the right bucket. To support that, indexing into the array of buckets has to be an O(1) operation.

    As the table gets overfilled, complexity degenerates to O(N). This reflects the complexity of traversing the linked list of nodes that hashed to the same bucket.

    Pointer invalidation rules are revealing as well. For example, once we've inserted an item into the table, a pointer to that item remains valid (and continues to point to that node) even if we do things like inserting or deleting other nodes. As such, we can rearrange the table as a whole by having a different arrangement of pointers to the nodes, but the node itself has to "stay put". This means we can't (for example) implement a bucket as something like a std:vector<T>--if we did so, when that vector got reallocated, existing pointers would be invalidated.

    You might be able to deviate from this a little bit if you tried really hard, but doing so and continuing to meet all the requirements becomes quite difficult (as I originally wrote this, I speculated on a couple of minor variations maybe being possible, but after comments and more thought, I doubt either really meets the requirements).