In CUDA, is the result of atomic operation immediately visible to the threads of other warps in the same block as the one performing the atomic operation? In case of non-atomic operation, I know that the result may not be visible until __syncthreads()
gets called.
Yes, by definition, atomic operations are completed and visible before any other thread executing an atomic operation on the same address accesses the same value.
However, it is possible to have a race condition if other threads access the same address via a non-atomic access at the same time, so you still must be careful to write correct concurrent code.