cudamulti-gpucuda-streams

Are CUDA streams device-associated? And how do I get a stream's device?


I have a CUDA stream which someone handed to me - a cudaStream_t value. The CUDA Runtime API does not seem to indicate how I can obtain the index of the device with which this stream is associated.

Now, I know that cudaStream_t is just a pointer to a driver-level stream structure, but I'm hesitant to delve into the driver too much. Is there an idiomatic way to do this? Or some good reason not to want to do it?

Edit: Another aspect to this question is whether the stream really is associated with a device in a way in which the CUDA driver itself can determine that device's identity given the pointed-to structure.


Solution

  • Yes, streams are device-specific.

    In CUDA, streams are specific to a context, and contexts are specific to a device.

    Now, with the runtime API, you don't "see" contexts - you use just one context per device. But if you consider the driver API - you have:

    CUresult cuStreamGetCtx ( CUstream hStream, CUcontext* pctx );
    

    CUstream and cudaStream_t are the same thing - a pointer. So, you can get the context. Then, you set or push that context to be the current context (read about doing that elsewhere), and finally, you use:

    CUresult cuCtxGetDevice ( CUdevice* device ) 
    

    to get the current context's device.

    So, a bit of a hassle, but quite doable.


    My approach to easily determining a stream's device

    My workaround for this issue is to have the (C++'ish) stream wrapper class keep (the context handle and) the device ID as member variables of the stream wrapper object, which means that you can write:

    auto my_device = cuda::device::get(1);
    auto my_stream = my_device.create_stream();
    assert(my_stream.device() == my_device());
    

    and not have to worry about it (+ it won't trigger the extra API calls since, at construction, we know what the current context is and what its device is).

    Note: The above snippet is for a system with at least two CUDA devices, otherwise there is no device with index 1...