I'm copying data to the GPU memory using hipMemcpyAsync
and then waiting for the device to complete the data transfer with hipDeviceSynchronize
. The description of the hipDeviceSynchronize
function here states that "the host thread gets blocked until all the commands associated with streams associated with the device." Observing the CPU utilization, I believe the host thread begins busy-waiting inside this function until the device completes its tasks. Is there any other function I can use or a variable I can set to modify this behavior to interrupt-based blocking, where the host thread is actually blocked by the OS and later woken up by an interrupt from the device?
I figured it out myself. All needed to be done was to add the following.
hipSetDeviceFlags(hipDeviceScheduleBlockingSync)