I’ve recently found out an odd thing. It seems that calculating Collatz sequence lengths with no caching at all is over 2 times faster than using std::unordered_map
to cache all elements.
Note I did take hints from question Is gcc std::unordered_map implementation slow? If so - why? and I tried to used that knowledge to make std::unordered_map
perform as well as I could (I used g++ 4.6, it did perform better than recent versions of g++, and I tried to specify a sound initial bucket count, I made it exactly equal to the maximum number of elements the map must hold).
In comparision, using std::vector
to cache a few elements was almost 17 times faster than no caching at all and almost 40 times faster than using std::unordered_map
.
Am I doing something wrong or is this container THAT slow and why? Can it be made performing faster? Or maybe hashmaps are inherently ineffective and should be avoided whenever possible in high-performance code?
The problematic benchmark is:
#include <iostream>
#include <unordered_map>
#include <cstdint>
#include <ctime>
std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) {
static std::unordered_map <std::uint_fast64_t, std::uint_fast16_t> cache ({{1,1}}, 2168611);
if(cache.count(val) == 0) {
if(val%2 == 0)
cache[val] = getCollatzLength(val/2) + 1;
else
cache[val] = getCollatzLength(3*val+1) + 1;
}
return cache[val];
}
int main()
{
std::clock_t tStart = std::clock();
std::uint_fast16_t largest = 0;
for(int i = 1; i <= 999999; ++i) {
auto cmax = getCollatzLength(i);
if(cmax > largest)
largest = cmax;
}
std::cout << largest << '\n';
std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << '\n';
}
It outputs: Time taken: 0.761717
Whereas a benchmark with no caching at all:
#include <iostream>
#include <unordered_map>
#include <cstdint>
#include <ctime>
std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) {
std::uint_fast16_t length = 1;
while(val != 1) {
if(val%2 == 0)
val /= 2;
else
val = 3*val + 1;
++length;
}
return length;
}
int main()
{
std::clock_t tStart = std::clock();
std::uint_fast16_t largest = 0;
for(int i = 1; i <= 999999; ++i) {
auto cmax = getCollatzLength(i);
if(cmax > largest)
largest = cmax;
}
std::cout << largest << '\n';
std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << '\n';
}
Outputs Time taken: 0.324586
The standard library's maps are, indeed, inherently slow. Google's Chandler Carruth explains this in his CppCon 2014 talk; in a nutshell:
std::map
maintains its data's sort order, leading to a tree-based implementation; and trees are slow: non-contiguous with a lot of pointer chain chasing.std::unoredered_map
is not as-bad, but it's still cache-unfriendly because it uses linked lists as buckets.This SO question mentioned some efficient hash map implementations - use one of those instead.