A while back I ported some of the C++ stdlib containers to environment where stdlib was not available. While iterators for contiguous and node-based containers were easy, I was stumped how to implement iterator for unordered_map
without storing a reference to the map or the end
iterator.
The primary issue was operator++
, which has to skip over empty buckets while at the same time not overrunning the underlying buffer:
AB_C_
^ ++
^ non-empty, done
AB_C_
^ ++
^ empty, move over
^ non-empty, done
AB_C_
^ ++
^ empty, move over
^overrun, how can it know to stop?
This assumes that you can even determine a bucket is empty just from a pointer to it, which may not be the case if unordered_map
uses a separate bitset to optimize space (making bucket { bool, Key, Value }
wastes alignof(Key) - 1
bits).
I just ended giving up and making the iterator { bucket*, unordered_map& }
, so that operator++
can call bucket* unordered_map::get_next_occupied_bucket(bucket*)
, since the map knows both which buckets are empty, and how many of them are there. I run into similar issues trying to implement filter_view
, but did not find a good solution for that either.
How do stdlib vendors (and other implementers) tend to overcome this issue? The example above uses open addressing, but it seems to me like chaining will run into the same problem, since topmost container is still an array with some of the elements missing.
The may reason I don't want to store reference is that it looks unelegant and naive, on par with implementing std::vector::iterator
as { vector&, size_t index }
, suggesting a better solution should exist.
You asked how std::unordered_map::iterator
could be implemented without referencing the underlying map.
However, std::unordered_map
does not use open addressing — it is based on separate chaining.
In your question, you provided a diagram based on open addressing, and then asked how the separate chaining-based std::unordered_map
handles this issue.
To better understand how std::unordered_map
is implemented, I recommend reading the following article carefully a few more times:
Inside STL: The unordered_map, unordered_set, unordered_multimap, and unordered_multiset
Given that your diagram uses open addressing, I assume the unordered_map you’re implementing is also based on open addressing. Unfortunately, I’m not familiar with how iterators are implemented for open-addressing-based unordered_map.
Since you’ve chosen to go with open addressing, I’m guessing you're not strictly adhering to the standard’s guarantee that std::unordered_map::iterator
should never be invalidated when insert or rehash.
That said, let me share how I implemented my own unordered_set.
The container I built is a single-buffer, separate chaining-based hash_set.
All elements are stored in a single std::vector
, which effectively represents multiple linked lists.
Each node in the vector holds the index of the next node in the chain, or -1
if it is the end of the chain.
Each bucket stores the index of the first node in its corresponding chain, or -1
if the bucket is empty.
When I look up an element, I compute its hash to find the appropriate bucket, and then follow the chain using the stored indices.
When iterating, I don't care about the order or the bucket — I simply iterate from vector.begin() to vector.end().
When removing an element, I move the last element in the vector into the removed element’s position to maintain continuity.
You can find my single_buffer_hash_set
implementation here:
https://github.com/JamilHsu/single_buffer_hash_set
Disclaimer:
This is a set, not a map, as I haven’t implemented the map version yet.
I’m not a professional programmer, so I can’t guarantee it's bug-free.
Most of the comments are in Chinese, but since the structure is simple, you could throw the whole source code into ChatGPT and ask it to explain how it works.