c++11stlhashsetunordered-multiset

Usecases for std::unordered_multiset


I'd like to know why one would ever use std::unordered_multiset. My guess is it has something to do with invalidations or non-invalidations of iterators after insert/erase, but maybe its something deeper? Very similar question is here: Use cases of std::multimap, but it's more of a discussion about maps.


Solution

  • With regards to your question, the most important features of unordered_multiset containers are that:

    Thus, a typical use-case for unordered_multiset would be when you need fast lookup and don't care that the data are unordered:

    Note that there are other situations where unordered containers cannot or should not be used.

    Regarding the uses cases where ordered containers should be prefered to unordered ones, you may want to read this answer. For general guidelines regarding container selection, you may want to read How can I efficiently select a Standard Library container in C++11?.

    EDIT

    Considering that an unordered multiset and a vector can often do very similar things, isn't it better to always use a vector? Don't vectors automatically outperform unordered multisets?

    Reproduced below are the results of a very simple benchmark (full code is provided at the end of this post):

    Here are the results obtained for containers of integers :

    Environment
    Compiler
    Windows 7
    VS Express 2013
    CygWin 64 bits
    gcc 4.9.3
    unordered_multiset<int> 0.75 s 0.8 s
    vector<int>, unsorted 7.9 s 11.0 s
    vector<int>, sorted 1.0 s 0.6 s

    In the example above, the unordered multiset is slightly better for the Windows benchmark, whereas the sorted vector is slightly better for the CygWin benchmark. For a multi-target development, there is no obvious choice here.

    Below are the results of a similar test with containers of strings:

    Environment
    Compiler
    Windows 7
    VS Express 2013
    CygWin 64 bits
    gcc 4.9.3
    unordered_multiset<string> 1 s 1 s
    vector<string>, unsorted 30 s 65 s
    vector<string>, sorted 130 s 32 s

    In this example, the unordered multisets outperforms by far the vectors.

    The exact figures don't matter much here, as they are specific to the specific conditions (hardware, OS, compiler, compiler options and so on) where these benchmarks were performed. What matters is that vectors sometimes outperform unordered multisets, but sometimes they don't. The only way to be sure whether unordered multisets or vectors should be used for a given application is to benchmark as realistically as feasible.

    Below is included the code for the integer container benchmark. As it was developed on the fly, all corrections and improvements are welcome!

    #include "stdafx.h"
    #include <iostream>
    #include <array>
    #include <unordered_set>
    #include <vector>
    #include <algorithm>
    #include <chrono>
    #include <cstdlib>
    #include <unordered_map>
    #include <string>
    
    using namespace std;
    
    const unsigned N = 100000;      // Number of test iterations (= insertions + lookups)
    typedef string Element;         // Type of data stored into the container to be tested
    array<Element, N> testData;     // Pseudo-random input sequence
    array<Element, N> lookupKeys;   // Pseudo-random lookup keys
    
    // Test action for an unordered_multiset (insert some random data then count some random key)
    struct unordered_multiset_action
    {
        typedef unordered_multiset<Element> Container;
        int operator()(Container& container, unsigned k)
        {
            container.insert(testData[k]);
            return container.count(lookupKeys[k]);
        }
    };
    
    // Test action for an unsorted vector (insert some random data then count some random key)
    struct unsorted_vector_action
    {
        typedef vector<Element> Container;
        int operator()(Container& container, unsigned k)
        {
            container.push_back(testData[k]);               
            return count(testData.cbegin(), testData.cend(), lookupKeys[k]);
        }
    };
    
    // Test action for a sorted vector (insert some random data then count some random key)
    struct sorted_vector_action
    {
        typedef vector<Element> Container;
        int operator()(Container& container, unsigned k)
        {
            container.insert(upper_bound(container.begin(), container.end(), testData[k]), testData[k]);
            auto range = equal_range(container.cbegin(), container.cend(), lookupKeys[k]);
            return range.second - range.first;
        }
    };
    
    // Builds an empty container to be tested
    // Then invokes N times the test action (insert some random key then count some other random key)
    template<class Action>
    long long container_test(Action action)
    {
        using Container = typename Action::Container;
        Container container;
        long long keyCount = 0;
        for (unsigned k = 0; k<N; ++k)
            keyCount += action(container, k);
        return keyCount;
    }
    
    int main(int nargs, char *args[])
    {
        using namespace chrono;
    
        // Parse user input to select which container should be tested
        enum SelectedContainer { UNORDERED_MULTISET, UNSORTED_VECTOR, SORTED_VECTOR };
        unordered_map<string, SelectedContainer> decoder{ { "unordered_multiset", UNORDERED_MULTISET },
                                                          { "unsorted_vector",    UNSORTED_VECTOR },
                                                          { "sorted_vector",      SORTED_VECTOR } };
        if ( nargs < 2 )
        {
            cerr << "Please provde an argument among those keywords: unordered_multiset, unsorted_vector, sorted_vector" << endl;
            return (-1);
        }
        auto it = decoder.find(args[1]);
        if (it == decoder.cend())
        {
            cerr << "Please enter one of the following arguments: unordered_multiset, unsorted_vector, sorted_vector" << endl;
            return (-1);
        }
        SelectedContainer selectedContainer = it->second;
    
        // Generate pseudo-random input data and input keys (no seeding for reproducibility)
        generate(testData.begin(),   testData.end(), []()   { return rand() % 256; });
        generate(lookupKeys.begin(), lookupKeys.end(), []() { return rand() % 256; });
    
        // Run test on container to be tested and measure elapsed time
        auto startTime = high_resolution_clock::now();
        long long keyCount;
        switch (selectedContainer)
        {
        case UNORDERED_MULTISET: 
            keyCount = container_test(unordered_multiset_action());
            break;
        case UNSORTED_VECTOR:
            keyCount = container_test(unsorted_vector_action());
            break;
        case SORTED_VECTOR:
            keyCount = container_test(sorted_vector_action());
            break;
        };
        
        auto endTime = high_resolution_clock::now();
    
        // Summarize test results
        duration<float> elaspedTime = endTime - startTime;
        cout << "Performed " << N << " insertions interleaved with " << N << " data lookups" << endl;
        cout << "Total key count = " << keyCount << endl;
        cout << "Elapsed time: " << duration_cast<milliseconds>(elaspedTime).count() << " milliseconds" << endl;
    }