I have a strange behavior with an std::map (or std::set, they seem to behave the same in this scenario). It might be that I have a grave misunderstanding about how this should work. I'm using VS2010 SP1.
Take for example this function:
extern time_t g_nElapsed;
UINT Thread(LPVOID _param)
{
UINT nRuns = (UINT)_param;
for(UINT i=0; i<nRuns; ++i)
{
time_t _1 = time(NULL);
std::set<UINT> cRandomSet;
cRandomSet.insert(1);
cRandomSet.insert(2);
cRandomSet.insert(3);
cRandomSet.insert(4);
g_nElapsed += (time(NULL) - _1);
}
return 0;
}
Now, if I run 8 threads with 100,000 iterations each, g_nElapsed will be roughly 40 seconds. If I run 1 thread with 800,000 iterations, g_nElapsed will be about 5 seconds. I am under the impression that g_nElapsed should be about the same for any reasonable amount of threads. So to speak... the processor usage increases with the number of threads, even though the work stays the same. However, it seems that some sort of resource contention with the set causes the runtime to increase. But why? It's thread local...
I'm sure it's a simple misconception and a simple fix, but I am not quite sure what the problem is here.
The following code does not exhibit this behavior:
extern time_t g_nElapsed;
UINT Thread(LPVOID _param)
{
UINT nRuns = (UINT)_param;
for(UINT i=0; i<nRuns; ++i)
{
time_t _1 = time(NULL);
UINT n[4];
n[0] = 1;
n[1] = 1;
n[2] = 1;
n[3] = 1;
g_nElapsed += (time(NULL) - _1);
}
return 0;
}
You are creating and destroying many containers, and each one uses operator new
to allocate memory. On many systems, this requires synchronization to manage the free memory that is handed out on typical, small allocations like yours. So you are probably incurring quite a lot of inter-thread contention there.
You might try a different allocator, such as tcmalloc (http://goog-perftools.sourceforge.net/doc/tcmalloc.html). It is specifically designed to deal with this.
Another approach would be to use an object pool or other allocation strategy to avoid using the standard allocation mechanism completely. That would require some code changes, whereas using tcmalloc does not.