cuda

CUDA malloc, mmap/mremap


CUDA device memory can be allocated using cudaMalloc/cudaFree, sure. This is fine, but primitive.

I'm curious to know, is device memory virtualised in some way? Are there equivalent operations to mmap, and more importantly, mremap for device memory? If device memory is virtualised, I expect these sorts of functions should exist. It seems modern GPU drivers implement paging when there is contention for limited video resources by multiple processes, which suggests it's virtualised in some way or another...

Does anyone know where I can read more about this?

Edit:
Okay, my question was a bit general. I've read the bits of the manual that talk about mapping system memory for device access. I was more interested in device-allocated memory however.

Specific questions:
- Is there any possible way to remap device memory? (ie, to grow a device allocation)
- Is it possible to map device allocated memory to system memory?
- Is there some performance hazard using mapped pinned memory? Is the memory duplicated on the device as needed, or will it always fetch the memory across the pci-e bus?

I have cases where the memory is used by the GPU 99% of the time; so it should be device-local, but it may be convenient to map device memory to system memory for occasional structured read-back without having to implement an awkward deep-copy.

Yes, unified memory exists, however I'm happy with explicit allocation, save for the odd moment when I want a sneaky read-back.

I've found the manual fairly light on detail in general.


Solution

  • CUDA comes with a fine CUDA C Programming Guide as it's main manual which has sections on Mapped Memory as well as Unified Memory Programming.

    Responding to your additional posted questions, and following your cue to leave UM out of the consideration: