c++stlmultimap

Removing elements from std::multimap


I'm using std::multimap with tens of millions of elements. The key is a pointer, and the data is a shared pointer to a class. Sometimes I need to remove a few elements, sometimes many elements, and sometimes all elements. I'm currently using std::erase_if and I'm wondering about performance for the case where all elements are removed.

The first part of the question is: If all elements are being removed, would clear be substantially faster than erase_if?

I'm guessing the answer is "yes," which leads to the next question. A problem is that, at the point of the erase_if call, I don't know whether or not all elements will be removed. So, I could do something like count_if first, and if the count equals the map size then do clear, and otherwise do the erase_if. When removing a few elements, I'm guessing the use of count_if followed by erase_if would roughly double the time, which I think would be ok. But when removing all elements, how much would the count_if reduce the gain of using clear?

(I'm using gcc 12.3.0.)


Solution

  • If all elements are being removed, would clear be substantially faster than erase_if?

    The relative speed depends on the implementation of the standard library used by your compiler and on your predicate. It is reasonable (not guaranteed, but reasonable) to assume that clear will not be slower than any other approach for emptying a container. Maybe faster, maybe not.

    One such "other" approach would be using erase_if and a predicate that always returns truth, as in

    std::erase_if(map, [](auto){ return true; })
    

    You know your predicate; is your predicate substantially slower than return true? When invoked once for each element of your container? If so, then erase_if with your predicate will be substantially slower than erase_if with an always-true predicate, hence (by transitivity) substantially slower than clear. If not, then (since both clear and erase_if have linear complexity) the difference depends on implementation details, which cannot be predicted theoretically; the difference must be measured to be known.

    But when removing all elements, how much would the count_if reduce the gain of using clear?

    The gain of using clear is the cost of invoking your predicate N times (where N is the size of your container) plus an implementation-dependent amount. The cost of count_if includes invoking your predicate N times. That completely wipes out the theoretical gain, leaving only the implementation-dependent portion, which must be measured to be known.

    At a guess, don't try to outsmart the standard library. The fact that std::erase_if has a specialization for std::multimap tells me that a good implementation will be more efficient than erasing each element individually. This is probably the gain you are looking for, which you already have by using std::erase_if instead of writing your own loop. Go ahead and measure to be certain, but I would not expect a gain from using count_if and clear, certainly not a significant one.