cudaatomiccuda-streamsgpu-atomics

CUDA global atomic operations across concurrent kernel executions


My CUDA application performs an associative reduction over a volume. Essentially each thread computes values which are atomically added to overlapping locations of the same output buffer in global memory.

Is it possible to concurrently launch this kernel with different input parameters and the same output buffer? In other words, each kernel would share the same global buffer and write to it atomically.

All kernels are running on the same GPU.


Solution

  • Yes, it's possible. atomic operations to global memory are device-wide. They will be atomic with respect to any code running on the device.