eventscuda

Is it "worth it" to reuse events in CUDA?


When using events in CUDA, I typically create an event and immediately record it on some stream. After synchronizing, I don't bother to hold on to that cudaEvent_t, to use it elsewhere - I just destroy it.

Other than avoiding the overhead of event creation and destruction, is there any other benefit to "recycling" events? If not, why did nVIDIA bother to separate cudaEventCreate() from cudaEventRecord() ?


Solution

  • First I'm trying to answer the question "what the overhead could be". As we don't have the source code of CUDA event. Everything is based on some reasonable guess. You could make totally different design decision to implement the CUDA event with same or similar behavior.

    In the timing task we know that at least the time of the event is recorded somewhere. As the event happens on the device side, I think the time is recorded in the device side memory to avoid using PCIe (high overhead) during recording. As eventually you get the time from the host side, the recorded time must be transferred through PCIe at sometime (probably eventSync()).

    You see during the whole procedure, you need some space both in host and device side memory to store the time. It looks good to me a perfect place to allocate/release the memory in eventCreate()/eventDestroy(), just like malloc()/free(). It also looks like a perfect overhead that you want to avoid when recording the time repeatedly (reusing the event).

    So two types of overhead here, Allocating device and host space, and PCIe transfer. This is my guess. Maybe you could have another way to implement the timing functionality without involving these overheads.

    Then finally, avoiding these overheads seems like a good reason that nVidia uses a separate eventCreate().