multithreadingthread-safetypthreadsmutexfractals

Updating large data matrix thread-safely: now using millions of mutexes?


I was revisting some code I wrote a long time ago, and decided to rewrite it to better make use of threads (and better use of programming in general..).

It is located here: https://github.com/buddhabrot/buddhabrot/blob/master/basic.c:

It is an application that renders a buddhabrot fractal. For reasons out of the scope of this question it is hard to use memoization to optimize this, and basically if you'd profile this, over 99% of the time is spent in the innermost loop that eventually does:

buddhabrot[col][row]++;

Multiple threads will execute this code. Since incrementing is not thread-safe, I used a specific mutex lock around this part of the memory. So, each addressable location in the buddhabrot memory has a separate mutex.

Now, this is more efficient than using one lock of course (which would definitely make all the threads wait for each other), but it is less memory efficient; it appears the mutexes take some data as well. I am also wondering about other repercussions in the pthreads implementations with millions of mutexes?

I now have two other strategies to consider:

So, is there anything else, anything better I could do?


Solution

  • You can use atomic increment functions like InterlockedIncrement from the intrin.h on Windows platforms.

    #include <intrin.h>
    
    #pragma intrinsic(_InterlockedExchangeAdd, _InterlockedIncrement, _InterlockedDecrement, _InterlockedCompareExchange, _InterlockedExchange)
    #define InterlockedExchangeAdd _InterlockedExchangeAdd
    #define InterlockedIncrement _InterlockedIncrement
    #define InterlockedDecrement _InterlockedDecrement
    #define InterlockedCompareExchange _InterlockedCompareExchange
    #define InterlockedExchange _InterlockedExchange
    
    #pragma intrinsic(abs, fabs, labs, memcmp, memcpy, memset, strcat, strcmp, strcpy, strlen)
    #pragma intrinsic(acos, cosh, pow, tanh, asin, fmod, sinh)
    #pragma intrinsic(atan, exp, log10, sqrt, atan2, log, sin, tan, cos) 
    

    This incrementation is atomic and there is no need to have millions of mutex or a global lock on your matrix.