c++cudaatomicgpu-shared-memory

Is it possible for a thread to atomically update 4 different places of the shared memory?


Suppose a thread of a kernel is trying to update 4 different places in shared memory. Can I cause that operation to fail and be reversed if any other thread has overwritten any of those locations? Specifically, can this be performed atomically?

mem[a] = x;
mem[b] = y;
mem[c] = z;
mem[d] = w;

Solution

  • No, except for a special case.

    This can't be performed atomically, in the general case where a, b,c, and d are arbitrary (i.e. not necessarily adjacent), and/or x,y,z, w are each 32 bits or larger.

    I'm using "atomically" to refer to an atomic RMW operation that the hardware provides.

    Such operations are limited to a maximum of 64-bits total, so 4 32-bit or larger quantities could not work. Furthermore all data must be contiguous and "naturally" aligned, so independent locations cannot be accessed in a single atomic cycle.

    In the special case where the 4 quantities are 16-bit or 8-bit quantities, and adjacent and aligned, you could use a custom atomic.

    Alternatives to consider:

    You can use critical sections to achieve such things, probably at considerable performance cost, code complexity, and fragility.

    Another alternative is to recast your algorithm to use some form of parallel reduction. Since you appear to be operating at the threadblock level, this may be the best approach.