cudaatomicunified-memory

Using atomic arithmetic operations in CUDA Unified Memory multi-GPU or multi-processor


I am trying to implement a CUDA program that uses Unified Memory. I have two unified arrays and sometimes they need to be updated atomically.

The question below has an answer for a single GPU environment but I am not sure how to extend the answer given in the question to adapt in multi-GPU platforms.

Question: cuda atomicAdd example fails to yield correct output

I have 4 Tesla K20 if you need this information and all of them updates a part of those arrays that must be done atomically.

I would appreciate any help/recommendations.


Solution

  • To summarize comments into an answer:

    As always, this information is neatly summarized in the relevant section of the Programming Guide.