c++stdmap

set a sentinel value for std::map.end() for gcc compiler


Here I specifically only care about the GCC compiler and run time code efficiency.

Consider the following code try me

#include <iostream>
#include <map>

char Find(const std::map<int, char>& map, int key) {
    auto iter = map.find(key);
    if (iter == map.end()) 
        return 'X';
    return iter->second;
}

char Find2(const std::map<int, char>& map, int key) {
    return map.find(key)->second;
}

int main()
{
    // part 1
    std::map<int, char> x{{0,'0'}, {4,'4'}};
    std::cout << Find(x, 3) << std::endl;
    std::cout << Find(x, 4) << std::endl;
    std::cout << (int)Find2(x, 3) << std::endl; // returns 0
    std::cout << Find2(x, 4) << std::endl;

    // part 2: Find2 is a shortcut
    std::map<int, char> y(x);
    y.end()->second = 'X';
    std::cout << Find2(y, 3) << std::endl;
    std::cout << Find2(y, 4) << std::endl;
}

The part 2 also works for a GCC compiler I tested in Godbolt, despite it uses the end() in a weird way.

In GCC, does map allocate a node std::pair to represent the end? Will it be changing when elements are added/deleted? This is related to how the map's end() is really implemented, and I am curious to know it.

As many people pointed out, the C++ standard defines it as UB if a the end() is dereferenced.

However, according to this answer, which GCC seems to have implemented the map in a way that the end() is pointing to a root node. With this I think setting the root node's value to X here seems to be a valid operation. Does this mean that the above code should work for GCC?


Solution

  • Since the question refers to how end() is implemented for the std::map container, I will deal with it, although the idea applies to almost all node-based containers in the libstdc++.

    The majority of the std::map interface, like the other associative containers, is a wrapper around a _Rb_tree object. In the header file, there is also the implementation of the base node, _Rb_tree_node_base, and the node,_Rb_tree_node. The node is a derived class of the base node, to which the storage member is added. To be precise, since C++11 the storage member is no more of type T, but is defined with the libstdc++ version of the std::aligned_storage type.

    The trick is to use an instance of the base node as a sentinel node, which represents both the past-the-last element (by definition, the sentinel node is a specifically designed node that is used as a path traversal terminator) and the before-the-first element. Essentially, the sentinel node is interpreted as both the position after the last element and the position before the first one. The libstdc++ implementation has designed the sentinel node of the std::map container to keep track of the root element, the leftmost element and the rightmost element using its pointers to the parent, the left child and the right child, respectively.

    The original implementation is the following:

    _Base_ptr&
           _M_root() _GLIBCXX_NOEXCEPT
           { return this->_M_impl._M_header._M_parent; }
    
    _Base_ptr&
           _M_leftmost() _GLIBCXX_NOEXCEPT
           { return this->_M_impl._M_header._M_left; }
    
    _Base_ptr&
          _M_rightmost() _GLIBCXX_NOEXCEPT
          { return this->_M_impl._M_header._M_right; }
    

    Thus, the _M_header object is the sentinel node. There are several advantages to using this approach for node-based containers, including the following two:

    A nice side effect of this implementation is that, if the iterator end() is incremented, it will land at the beginning of the container. As aforementioned, the std::forward_list container does not have the same features of the other node-based containers because it is not properly a sentinel-node container.

    For all other sentinel-node containers, such as std::list, the following code is valid:

    std::list<int> x{0, 1, 2, 3, 4, 5};
    if(++x.end() == x.begin())
      std::cout << "It works!\n";
    

    Trying to dereference the end() iterator is UB.