The following program used the implementation of atomic locks from 'Cuda By Example', but running the program makes my machine frozen. Can someone tell me what's wrong with my program? Thanks a lot
Yifei
#include <stdio.h>
__global__ void test()
{
__shared__ int i, mutex;
if (threadIdx.x == 0) {
i = 0;
mutex = 0;
}
__syncthreads();
while( atomicCAS(&mutex, 0, 1) != 0);
i++;
printf("thread %d: %d\n", threadIdx.x, i);
atomicExch(&mutex,0);
}
Here is a theory. I hope that you are familiar with the concept of a warp. In the while loop all threads within a warp will enter the while loop. Only one will exit and the rest of the threads will reside inside the while loop. This will introduce a divergent branch making the thread that exited the while loop stall until the branch converges again. Because this thread is the only one that can release the mutex this will never happen because it waits for the other threads do converge.