cudasleepgpuusleep

Equivalent of usleep() in CUDA kernel?


I'd like to call something like usleep() inside a CUDA kernel. The basic goal is to make all GPU cores sleep or busywait for a number of millesconds--it's part of some sanity checks that I want to do for a CUDA application. My attempt at doing this is below:

#include <unistd.h>
#include <stdio.h>
#include <cuda.h>
#include <sys/time.h>

__global__ void gpu_uSleep(useconds_t wait_time_in_ms)
{
    usleep(wait_time_in_ms);
}

int main(void)
{
    //input parameters -- arbitrary
    //   TODO: set these exactly for full occupancy
    int m = 16;
    int n = 16;
    int block1D = 16;
    dim3 block(block1D, block1D);
    dim3 grid(m/block1D, n/block1D);

    useconds_t wait_time_in_ms = 1000;

    //execute the kernel
    gpu_uSleep<<< grid, block >>>(wait_time_in_ms);
    cudaDeviceSynchronize();

    return 0;
}

I get the following error when I try to compile this using NVCC:

error: calling a host function("usleep") from a __device__/__global__ 
       function("gpu_uSleep") is not allowed

Clearly, I'm not allowed to use a host function such as usleep() inside a kernel. What would be a good alternative to this?


Solution

  • You can busy wait with a loop that reads clock().

    To wait for at least 10,000 clock cycles:

    clock_t start = clock();
    clock_t now;
    for (;;) {
      now = clock();
      clock_t cycles = now > start ? now - start : now + (0xffffffff - start);
      if (cycles >= 10000) {
        break;
      }
    }
    // Stored "now" in global memory here to prevent the
    // compiler from optimizing away the entire loop.
    *global_now = now;
    

    Note: This is untested. The code that handles overflows was borrowed from this answer by @Pedro. See his answer and section B.10 in the CUDA C Programming Guide 4.2 for details on how clock() works. There is also a clock64() command.