cudacuda-streams

CUDA streams not overlapping


I have something very similar to the code:

int k, no_streams = 4;
cudaStream_t stream[no_streams];
for(k = 0; k < no_streams; k++) cudaStreamCreate(&stream[k]);

cudaMalloc(&g_in,  size1*no_streams);
cudaMalloc(&g_out, size2*no_streams);

for (k = 0; k < no_streams; k++)
  cudaMemcpyAsync(g_in+k*size1/sizeof(float), h_ptr_in[k], size1, cudaMemcpyHostToDevice, stream[k]);

for (k = 0; k < no_streams; k++)
  mykernel<<<dimGrid, dimBlock, 0, stream[k]>>>(g_in+k*size1/sizeof(float), g_out+k*size2/sizeof(float));

for (k = 0; k < no_streams; k++)
  cudaMemcpyAsync(h_ptr_out[k], g_out+k*size2/sizeof(float), size2, cudaMemcpyDeviceToHost, stream[k]);

cudaThreadSynchronize();

cudaFree(g_in);
cudaFree(g_out);

h_ptr_in and h_ptr_out are arrays of pointers allocated with cudaMallocHost() (with no flags).

The problem is that the streams do not overlap. In the visual profiler I can see the kernel execution from the first stream overlapping with the copy (H2D) from the second stream but nothing else overlaps.

I may not have resources to run 2 kernels (I think I do) but at least the kernel execution and copy should be overlapping, right? And if I put all 3 (copy H2D, kernel execution, copy D2H) within the same for-loop none of them overlap...

What can be causing this?

I'm running on:

Ubuntu 10.04 x64

Device: "GeForce GTX 460"
  (CUDA Driver Version:                           3.20,
  CUDA Runtime Version:                          3.20,
  CUDA Capability Major/Minor version number:    2.1,
  Concurrent copy and execution:                 Yes,
  Concurrent kernel execution:                   Yes)

Solution

  • According to this post on the NVIDIA forums, the profiler will serialize streaming to get accurate timing data. If you think your timings are off, make sure you're using CUDA events...

    I've been experimenting with streaming lately, and I found the "simpleMultiCopy" example from the SDK to be really helpful, particularly with the appropriate logic and synchronizations.