I am trying to implement a CUDA program that uses Unified Memory. I have two unified arrays and sometimes they need to be updated atomically.
The question below has an answer for a single GPU environment but I am not sure how to extend the answer given in the question to adapt in multi-GPU platforms.
Question: cuda atomicAdd example fails to yield correct output
I have 4 Tesla K20 if you need this information and all of them updates a part of those arrays that must be done atomically.
I would appreciate any help/recommendations.
To summarize comments into an answer:
atomicAdd_system
-arch=sm_60
or similarAs always, this information is neatly summarized in the relevant section of the Programming Guide.