c++unordered-map

Why does the performance of unordered_map and map differ between compilers like MSVC and GCC?


From what I know, unordered_map (hash table) should be faster than map (red-black tree). But when I tested them in Visual Studio, unordered_map was faster for both insertion and lookup. However, when I ran the same tests with GCC, the map insertion turned out faster than unordered_map. The test used 1 million random numbers generated by rand() after srand(time(nullptr)), and the environment was Debug/Win32.

#include <time.h>

#include <chrono>
#include <iostream>
#include <map>
#include <unordered_map>
#include <vector>

using namespace std;

int main() {
    int n = 1000000;  // 一百万数据
    vector<int> v;
    v.reserve(n);
    srand(time(0));

    auto clk = std::chrono::steady_clock{};

    for (size_t i = 0; i < n; i++) {
        v.push_back(rand());
    }

    unordered_map<int, int> um;
    auto begin1 = clk.now();
    for (auto e : v) um.insert(make_pair(e, e));
    auto end1 = clk.now();

    map<int, int> m;
    auto begin2 = clk.now();
    for (auto e : v) m.insert(make_pair(e, e));
    auto end2 = clk.now();

    cout << "insert:" << endl;
    cout << "unordered_map: "
         << chrono::duration_cast<chrono::milliseconds>(end1 - begin1).count()
         << " ms" << endl;
    cout << "map: "
         << chrono::duration_cast<chrono::milliseconds>(end2 - begin2).count()
         << " ms" << endl;

    std::vector<std::pair<int, int>> res;
    res.reserve(n);

    auto begin3 = clk.now();
    for (auto e : v) res.push_back(*um.find(e));
    auto end3 = clk.now();

    res.resize(0);
    res.reserve(n);

    auto begin4 = clk.now();
    for (auto e : v) res.push_back(*m.find(e));
    auto end4 = clk.now();
    cout << "find:" << endl;
    cout << "unordered_map: "
         << chrono::duration_cast<chrono::milliseconds>(end3 - begin3).count()
         << " ms" << endl;
    cout << "map: "
         << chrono::duration_cast<chrono::milliseconds>(end4 - begin4).count()
         << " ms" << endl;

    auto begin5 = clk.now();
    for (auto e : v) um.erase(e);
    auto end5 = clk.now();

    auto begin6 = clk.now();
    for (auto e : v) m.erase(e);
    auto end6 = clk.now();
    cout << "erase" << endl;
    cout << "unordered_map: "
         << chrono::duration_cast<chrono::milliseconds>(end5 - begin5).count()
         << " ms" << endl;
    cout << "map: "
         << chrono::duration_cast<chrono::milliseconds>(end6 - begin6).count()
         << " ms" << endl;

    std::cout << "Max load factor: " << um.max_load_factor() << std::endl;
    return 0;
}

Result (Godbolt, gcc 15.1, -O3 -std=c++20, https://godbolt.org/z/P93MG5svr)

insert:
unordered_map: 401 ms
map: 1898 ms
find:
unordered_map: 67 ms
map: 1880 ms
erase
unordered_map: 736 ms
map: 2065 ms
Max load factor: 1

Result (Godbolt, msvc 19.43, /O2 /std:c++20, https://godbolt.org/z/55aehW5rj)

insert:
unordered_map: 16 ms
map: 159 ms
find:
unordered_map: 19 ms
map: 162 ms
erase
unordered_map: 5 ms
map: 37 ms
Max load factor: 1

Result (Godbolt, clang 20.1, -O3 -std=c++20, https://godbolt.org/z/7oM83qM4d)

insert:
unordered_map: 205 ms
map: 1217 ms
find:
unordered_map: 52 ms
map: 1058 ms
erase
unordered_map: 151 ms
map: 2713 ms
Max load factor: 1

Solution

  • Note rand() on MSVC produces very small range of values. The RAND_MAX is only 0x7fff. This is result of history of MSVC reaching times when 16bits was most common.

    As a result your tests are invalid. Your size of container n = 1000000 is much bigger then then RAND_MAX=32767.

    On other compilers RAND_MAX is greater: 2147483647

    https://godbolt.org/z/dzvTbq6Wr

    So MSVC has less possible values to handle and less allocation is needed, so it is faster.

    Your example tweaked to log size of container.


    If you drop use of rand in favor of more modern std::mt19937 then it seem ok: https://godbolt.org/z/cf8rdnhv5

    Be aware that doing performance measurements in C++ is not so simple. Compiler is able to do such aggressive optimizations that this can corrupt result of this kind of test (didn't check if your example suffers form this problem too).