cudagpu-warp

cuda warp size and control divergence


I have query about following question:

Suppose, we have a 9*7 picture (7 pixels in the x direction and 9 pixels in the y direction), how many warps will have control divergence assuming block of 4*4 threads and 8 threads per warp?

How will the blocks and warps be organized here? for x or horizontal direction, i can assume 2 blocks per row.Similarly, for vertical direction, 3 blocks per column. But, How will the warps are organized? Can someone point out the thread ids of the warps , and the cases where control divergence happens(Thread ids etc for those).

thanks


Solution

  • Suppose, we have a 9*7 picture (7 pixels in the x direction and 9 pixels in the y direction), how many warps will have control divergence assuming block of 4*4 threads and 8 threads per warp?

    1. Divergence is a property of the program (the code), not of the block/warp layout itself. If your algorithm operates identically across all pixels in the image then there will be no divergence whatsoever, irrespective of the number of threads and their organization. If your algorithm branches on warp boundaries, there will be no divergence either. Therefore, without seeing your code, your question is technically unanswerable.
    2. If you're running with a block of 16 threads and 8 threads per warp (which is not physically possible on CUDA hardware: warps are made of 32 threads and their size is not configurable) then you might as well run without a GPU at all. These numbers are way too small to benefit from any hardware acceleration.

    How will the blocks and warps be organized here? for x or horizontal direction, i can assume 2 blocks per row.Similarly, for vertical direction, 3 blocks per column. But, How will the warps are organized?

    I'll stick to your example and try to provide a schema of the thread IDs, block IDs, warp IDs. Keep in mind that this layout is, in practice, impossible on CUDA hardware.

    Image     Global Thread IDs      Block IDs              Local Thread IDs
    □□□□□□□ | 00 01 02 03 04 05 06 | 00 00 00 00 00 00 00 | 00 01 02 03 04 05 06
    □□□□□□□ | 07 08 09 10 11 12 13 | 00 00 00 00 00 00 00 | 07 08 09 10 11 12 13
    □□□□□□□ | 14 15 16 17 18 19 20 | 00 00 01 01 01 01 01 | 14 15 00 01 02 03 04
    □□□□□□□ | 21 22 23 24 25 26 27 | 01 01 01 01 01 01 01 | 05 06 07 08 09 10 11
    □□□□□□□ | 28 29 30 31 32 33 34 | 01 01 01 01 02 02 02 | 12 13 14 15 00 01 02
    □□□□□□□ | 35 36 37 38 39 40 41 | 02 02 02 02 02 02 02 | 03 04 05 06 07 08 09
    □□□□□□□ | 42 43 44 45 46 47 48 | 02 02 02 02 02 02 03 | 10 11 12 13 14 15 00
    □□□□□□□ | 49 50 51 52 53 54 55 | 03 03 03 03 03 03 03 | 01 02 03 04 05 06 07
    □□□□□□□ | 56 57 58 59 60 61 62 | 03 03 03 03 03 03 03 | 08 09 10 11 12 13 14
    ----------------------------------------------------------------------------
    Image     Global Warp IDs        Block IDs              Local Warp IDs
    □□□□□□□ | 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00
    □□□□□□□ | 00 01 01 01 01 01 01 | 00 00 00 00 00 00 00 | 00 01 01 01 01 01 01
    □□□□□□□ | 01 01 02 02 02 02 02 | 00 00 01 01 01 01 01 | 01 01 00 00 00 00 00
    □□□□□□□ | 02 02 02 03 03 03 03 | 01 01 01 01 01 01 01 | 00 00 00 01 01 01 01
    □□□□□□□ | 03 03 03 03 04 04 04 | 01 01 01 01 02 02 02 | 01 01 01 01 00 00 00
    □□□□□□□ | 04 04 04 04 04 05 05 | 02 02 02 02 02 02 02 | 00 00 00 00 00 01 01
    □□□□□□□ | 05 05 05 05 05 05 06 | 02 02 02 02 02 02 03 | 01 01 01 01 01 01 00
    □□□□□□□ | 06 06 06 06 06 06 06 | 03 03 03 03 03 03 03 | 00 00 00 00 00 00 00
    □□□□□□□ | 07 07 07 07 07 07 07 | 03 03 03 03 03 03 03 | 01 01 01 01 01 01 01
    ----------------------------------------------------------------------------
    

    and the cases where control divergence happens(Thread ids etc for those)

    As mentioned above, divergence being a property of the code and not the thread layout, this question cannot be answered without code.