I am trying to implement this emplace or merge
template<typename T>
T& EmplaceOrMerge(std::unordered_set<T>& s,
T&& t,
std::function<void(T&& a, T& b)> merge)
{
auto it = s.emplace(std::move(t));
T& u = const_cast<T&>(*it.first);
if (!it.second)
merge(std::move(t), u);
return u;
}
The merge function modifies its second argument in a way that preserves its hashed value. I am concerned with the use of std::move(t)
in the merge case, because the emplace
may have moved it already. I have read Microsoft's implementation of unordered_set
and there is a very nice special case for that, if constexpr (_In_place_key_extractor::_Extractable)
. It recognizes that its argument std::move(t)
can be hashed directly (and compared with operator==
directly), without constructing another object T
, and returns immediately the equivalent value in the unordered_set
when there is one.
Does this special case occur in all implementations of the standard library ? If not I have undefined behaviour, and I wonder if there is another way to code this EmplaceOrMerge
.
No, libstdc++ does not perform this optimization.
struct A {
A() = default;
A(A&&) { std::format_to(std::ostreambuf_iterator<char>(std::cout), "A(A&&)\n"); }
bool operator==(A const&) const = default;
};
template<> struct std::hash<A> { std::size_t operator()(A const&) const { return 0; } };
int main() {
std::unordered_set<A> s;
A a;
std::format_to(std::ostreambuf_iterator<char>(std::cout), "{}\n",
s.emplace(std::move(a)).second);
std::format_to(std::ostreambuf_iterator<char>(std::cout), "{}\n",
s.emplace(std::move(a)).second);
}
This program prints:
A(A&&)
true
false
under libc++ (and, presumably, under MS-STL), but prints
A(A&&)
true
A(A&&)
false
under libstdc++.
Demo.
I wonder if there is another way to code this
EmplaceOrMerge
.
You can't get around the fact that libstdc++ will only call Hash
on an already constructed node. If you can't change the data structure (e.g. to a std::unordered_map
from an extracted key) one option would be to use the node-handle interface, which can avoid side effects on failed insertion. Using it may still pay the cost of a move and move-assign back, but hopefully that is relatively cheap:
template<class T>
auto try_emplace(std::unordered_set<T>& s, std::type_identity_t<T>&& t) {
std::unordered_set<T> s2;
auto nh = s2.extract(s2.insert(std::move(t)).first);
auto const ins = s.insert(std::move(nh));
if (not ins.inserted)
t = std::move(ins.node.value());
return std::pair(ins.position, ins.inserted);
}
Demo.
And in your case, you can shortcut the move-assign back, so the overhead is only one move (and extra node allocation):
template<typename T>
T& EmplaceOrMerge(std::unordered_set<T>& s,
T&& t,
std::function<void(T&& a, T& b)> merge)
{
std::unordered_set<T> s2;
auto nh = s2.extract(s2.insert(std::move(t)).first);
auto const ins = s.insert(std::move(nh));
T& u = const_cast<T&>(*ins.position);
if (not ins.inserted)
merge(std::move(ins.node.value()), u);
return u;
}