processcudalifetimenvidia-smi

CUDA process life time


It seems I didn't understand something essential about CUDA. I am using a C++ GUI application to start some kernels on a dual GPU card. When I start the host process, no process is listed by nvidia-smi. This is expected because the host process waits until I click a button before it uses CUDA and starts the kernels. If I push the button, the two kernels run fine on both GPUs, exit and return the expected results. The host process then is listed two times by nvidia-smi, once for each GPU. Both processes are visible in nvidia-smi until I exit the host process.

I am a bit confused since there is no such thing as a cudaOpen() or cudaClose() function (or a similar function pair).

Which CUDA API call(s) cause a process to be listed by nvidia-smi? Which CUDA API call(s) cause a process to be dropped from the list?


Solution

  • This is explained in the CUDA documentation, Section 3.2.1 . https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#initialization

    3.2.1. Initialization

    There is no explicit initialization function for the runtime; it initializes the first time a runtime function is called (more specifically any function other than functions from the error handling and version management sections of the reference manual). One needs to keep this in mind when timing runtime function calls and when interpreting the error code from the first call into the runtime.

    The runtime creates a CUDA context for each device in the system (see Context for more details on CUDA contexts). This context is the primary context for this device and is initialized at the first runtime function which requires an active context on this device. It is shared among all the host threads of the application. As part of this context creation, the device code is just-in-time compiled if necessary (see Just-in-Time Compilation) and loaded into device memory. This all happens transparently. If needed, e.g. for driver API interoperability, the primary context of a device can be accessed from the driver API as described in Interoperability between Runtime and Driver APIs.

    When a host thread calls cudaDeviceReset(), this destroys the primary context of the device the host thread currently operates on (i.e., the current device as defined in Device Selection). The next runtime function call made by any host thread that has this device as current will create a new primary context for this device. Note: The CUDA interfaces use global state that is initialized during host program initiation and destroyed during host program termination. The CUDA runtime and driver cannot detect if this state is invalid, so using any of these interfaces (implicitly or explicity) during program initiation or termination after main) will result in undefined behavior.