multithreadingcudaunified-memory

Can CUDA unified memory be written to by another CPU thread?


I am writing a program that retrieves images from a camera and processes them with CUDA. In order to gain the best performance, I'm passing a CUDA unified memory buffer to the image acquisition library, which writes to the buffer in another thread.

This causes all sorts of weird results where to program hangs in library code that I do not have access to. If I use a normal memory buffer and then copy to CUDA, the problem is fixed. So I became suspicious that writing from another thread might not allowed, and googled as I did, I could not find a definitive answer.

So is accessing the unified memory buffer from another CPU thread is allowed or not?


Solution

  • There should be no problem writing to a unified memory buffer from multiple threads.

    However, keep in mind the restrictions imposed when the concurrentManagedAccess device property is not true. In that case, when you have a managed buffer, and you launch a kernel, no CPU/host thread access of any kind is allowed, to that buffer, or any other managed buffer, until you perform a cudaDeviceSynchronize() after the kernel call.

    In a multithreaded environment, this might take some explicit effort to enforce.

    I think this is similar to this recital if that is also your posting. Note that TX2 should have this property set to false.

    Note that this general rule in the non-concurrent case can be modified through careful use of streams. However the restrictions still apply to buffers attached to streams that have a kernel launched in them (or buffers not explicitly attached to any stream): when the property mentioned above is false, access by any CPU thread is not possible.

    The motivation for this behavior is roughly as follows. The CUDA runtime does not know the relationship between managed buffers, regardless of where those buffers were created. A buffer created in one thread could easily have objects in it with embedded pointers, and there is nothing to prevent or restrict those pointers from pointing to data in another managed buffer. Even a buffer that was created later. Even a buffer that was created in another thread. The safe assumption is that any linkages could be possible, and therefore, without any other negotiation, the managed memory subsystem in the CUDA runtime must move all managed buffers to the GPU, when a kernel is launched. This makes all managed buffers, without exception, inaccessible to CPU threads (any thread, anywhere). In the normal program flow, access is restored at the next occurrence of a cudaDeviceSynchronize() call. Once the CPU thread that issues that call completes the call and moves on, then managed buffers are once again visible to (all) CPU threads. Another kernel launch (anywhere) repeats the process, and interrupts the accessibility. To repeat, this is the mechanism that is in effect when the concurrentManagedAccess property on the GPU is not true, and this behavior can be somewhat modified via the aforementioned stream attach mechanism.

    This posting may also be of interest.