cudagpucpu

How SIMD vs SIMT handle divergence


What exactly happens at the hardware level when a divergence occurs in SIMD and SIMT architectures, and how does each handle the execution of different instruction paths?

I found this question, but that doesn't clear me on what exactly happens with SIMT.


Solution

  • Let's say we have eight threads (0~7). They execute code below.

    __global__ void kernel(args...) {
       int tid = blockDim.x * blockIdx.x + threadIdx.x;
    
       if(tid % 2 == 0) 
         // do this
       else 
         // do that
    
       ...
    }
    

    Threads with even index number execute "do this" code, while threads with odd index number execute "do that" code.

    Internally, there are all executed same but the time when "do this" code is executed, the result of thread with odd index number is masked so that their result cannot be effective and vice versa.

    Thread index          : 0 1 2 3 4 5 6 7
    Execution of "do this": 1 0 1 0 1 0 1 0
    Execution of "do that": 0 1 0 1 0 1 0 1
    

    Overall, the execution of two thread groups (even and odd) are diverged so that they are executed as serial.