c++unordered-set

Emplace or merge on std::unordered_set


I am trying to implement this emplace or merge

template<typename T>
T& EmplaceOrMerge(std::unordered_set<T>& s,
                  T&& t,
                  std::function<void(T&& a, T& b)> merge)
{
    auto it = s.emplace(std::move(t));
    T& u = const_cast<T&>(*it.first);
    if (!it.second)
        merge(std::move(t), u);
    return u;
}

The merge function modifies its second argument in a way that preserves its hashed value. I am concerned with the use of std::move(t) in the merge case, because the emplace may have moved it already. I have read Microsoft's implementation of unordered_set and there is a very nice special case for that, if constexpr (_In_place_key_extractor::_Extractable). It recognizes that its argument std::move(t) can be hashed directly (and compared with operator== directly), without constructing another object T, and returns immediately the equivalent value in the unordered_set when there is one.

Does this special case occur in all implementations of the standard library ? If not I have undefined behaviour, and I wonder if there is another way to code this EmplaceOrMerge.


Solution

  • No, libstdc++ does not perform this optimization.

    struct A {
        A() = default;
        A(A&&) { std::format_to(std::ostreambuf_iterator<char>(std::cout), "A(A&&)\n"); }
        bool operator==(A const&) const = default;
    };
    template<> struct std::hash<A> { std::size_t operator()(A const&) const { return 0; } };
    int main() {
        std::unordered_set<A> s;
        A a;
        std::format_to(std::ostreambuf_iterator<char>(std::cout), "{}\n",
            s.emplace(std::move(a)).second);
        std::format_to(std::ostreambuf_iterator<char>(std::cout), "{}\n",
            s.emplace(std::move(a)).second);
    }
    

    This program prints:

    A(A&&)
    true
    false
    

    under libc++ (and, presumably, under MS-STL), but prints

    A(A&&)
    true
    A(A&&)
    false
    

    under libstdc++.

    Demo.


    I wonder if there is another way to code this EmplaceOrMerge.

    You can't get around the fact that libstdc++ will only call Hash on an already constructed node. If you can't change the data structure (e.g. to a std::unordered_map from an extracted key) one option would be to use the node-handle interface, which can avoid side effects on failed insertion. Using it may still pay the cost of a move and move-assign back, but hopefully that is relatively cheap:

    template<class T>
    auto try_emplace(std::unordered_set<T>& s, std::type_identity_t<T>&& t) {
        std::unordered_set<T> s2;
        auto nh = s2.extract(s2.insert(std::move(t)).first);
        auto const ins = s.insert(std::move(nh));
        if (not ins.inserted)
            t = std::move(ins.node.value());
        return std::pair(ins.position, ins.inserted);
    }
    

    Demo.

    And in your case, you can shortcut the move-assign back, so the overhead is only one move (and extra node allocation):

    template<typename T>
    T& EmplaceOrMerge(std::unordered_set<T>& s,
                      T&& t,
                      std::function<void(T&& a, T& b)> merge)
    {
        std::unordered_set<T> s2;
        auto nh = s2.extract(s2.insert(std::move(t)).first);
        auto const ins = s.insert(std::move(nh));
        T& u = const_cast<T&>(*ins.position);
        if (not ins.inserted)
            merge(std::move(ins.node.value()), u);
        return u;
    }