cudagpgpuremote-access

How can I flush GPU memory using CUDA (physical reset is unavailable)


My CUDA program crashed during execution, before memory was flushed. As a result, device memory remained occupied.

I'm running on a GTX 580, for which nvidia-smi --gpu-reset is not supported.

Placing cudaDeviceReset() in the beginning of the program is only affecting the current context created by the process and doesn't flush the memory allocated before it.

I'm accessing a Fedora server with that GPU remotely, so physical reset is quite complicated.

So, the question is - Is there any way to flush the device memory in this situation?


Solution

  • Although it should be unecessary to do this in anything other than exceptional circumstances, the recommended way to do this on linux hosts is to unload the nvidia driver by doing

    $ rmmod nvidia 
    

    with suitable root privileges and then reloading it with

    $ modprobe nvidia
    

    If the machine is running X11, you will need to stop this manually beforehand, and restart it afterwards. The driver intialisation processes should eliminate any prior state on the device.

    This answer has been assembled from comments and posted as a community wiki to get this question off the unanswered list for the CUDA tag