Is a mutex needed in the following Parallel Computation of a Cellular Automata?
I thought it was always advised to use a mutex lock when writing to a shared resource when using parallel threads.
The mutex is dramatically slowing the program, but is it needed in order to ensure data in the 2d array (futureOcean
) remains safe/consistent?
When a mutex locks futureOcean
during write operations:
2 threads: 163.352990 seconds
5 threads: 515.739237 seconds
10threads: 1021.035517 seconds
Without the use of a mutex:
2 threads: 65.817534 seconds
10threads: 93.822217 seconds
I'm implementing a cellular automata model, specifically the fish, shark and ocean simulation.
I've got a 1000x600 2d array called ocean
, obviously representing the ocean.
And another 1000x600 2d array called futureOcean
, which will store the states of each cell for currentGeneration + 1
.
Initially the ocean
is populated with:
Processing a single generation involves processing rules for each cell within ocean
.
Each time I compute the new value for the current cell, I store its future value in a 2nd 2d array (futureOcean), at the same row
/column
position that the original value is inside ocean
. So each cell in futureOcean
will only be updated once per generation.
And I do this 20000 times (generations).
If Ocean
is solely being read from during construction of a new generation, there are no potential race conditions and no lock is needed. Multiple threads can read from a constant source without problems.
If each location in futureOcean
is only updated once, then there are no competing writes for the location, and no lock is needed. Individual threads can write to unique locations untouched by other threads without problems.
Then, you'd have to wait for futureOcean
to be completely updated before starting a new generation, to avoid reading from it while it is still being written to.
You can speed up multithreaded performance by splitting up the work in such a manner that each thread writes to a contiguous section of the array. Otherwise, multiple threads could write to locations that are close to each other. If this happens within the distance of a cache line or two (64 to 128 bytes in an x86/64 processor), then you could be repeatedly invalidating that section of memory and force reloading in multiple core caches. (See false sharing)