c++time-complexitymultisetunordered-multiset

Why multiset keeps separate instances of repeated elements instead of their count?


I recently found out that multiset<T> implementation in STL actually keeps different copies of the same repeated elements in the tree. My expectation before was for it to internally use a map<T, int> and just keep the count of the repeated elements.

What is the scenario where this implementation can be beneficial compared to just keeping the count? Is there any use case for multiset where the code will break if the internal implementation changes? Or is there any operation which will increase in complexity if it changes?

I was wondering what is the thought process behind that choice?


Solution

  • By default std::multiset uses operator< to decide if two elements are equivalent (note: equiavent not equal!). Now consider this element type:

    struct foo { 
        int x;
        int y;
        bool operator<(const foo& other) { return x < other.x; }
        bool operator==(const foo& other) { return (x==other.x) && (y==other.y);
    };
    

    Two elements are considered equivalent when !(a < b) && !(b < a), that does in general not imply that a == b. Hence, storing only the count is not sufficient.

    Moreoever, even equality does not imply identity (a==b does not imply &a==&b). As the container owns its elements, two elements in the container cannot possibily be identical, they are always two different objects. Also, in the above foo we could add a member z that is neither considered for operator< nor for operator== but can be accessed via a getter.


    The reason sorted containers typically check equivalence not equality is that they need a way to compare elements anyhow (they are sorted). Being able to compare two elements via < is sufficient to check equivalence, while requiring == in addition would make the container less generic without apparent gain. If you like you can always use a comparator that is such that equivalence does imply equality (in the above example: compare also y in operator<).


    As already mentioned in a comment, if you want to have the behavior you describe you could use a std::map<foo,size_t> (uses operator< as default comparator for the keys as well). Instead of only storing the foos you can store a pair of foo and the count. Now, because two foos with same x are considered equivalent, they do map to the same key:

     foo f,g;
     f.x = 42; f.y = 0;
     g.x = 42; g.y = 100;
     std::map<foo,size_t> counter;
     ++counter[f];
     ++counter[g];
    

    As a result, counter will hold 1 element: a copy of f and the count for that key will be 2.