In this discussion of the runtime vs the driver API, it is said that
Primary contexts are created as needed, one per device per process, are reference-counted, and are then destroyed when there are no more references to them.
What counts as such a reference? And - does this not imply that, often, the primary context is supposed to be destroyed right after being used, repeatedly? e.g. you get the default device ID, then launch a kernel; what "references" remain? Surely it's not the integer variable holding the device id...
None of the exact internal workings of the runtime API are documented and there is empirical evidence that they have subtly changed over time. That said, if you inspect the host code boilerplate the toolchain emits and run some host side traces, it is possible to infer how it works, and what follows is my understanding based on observations made in this way.
It is important to realize that primary context reference counting is an internal function within the driver and the "lazy context establishment" mechanism itself uses some internal API hooks which will either bind to an existing primary context created explicitly by the driver API (which increments the reference count) or create one itself if none is available and then bind to that context (which also increments the reference count). The routines which unbind from a primary context are registered via atexit
and will trigger when the application exits or when cudaDeviceReset()
is called.
This approach prevents the potential scenario you have posited whereby contexts are continuously destroyed when their reference count falls to zero and then recreated when another runtime API functional is called. That doesn't happen.