c++openclgpuamd-processoramd-app

OpenCL time measurment issues with AMD GPU


I recently compared 2 kinds of doing kernel runtime measuring and I see some confusing results.

I use an AMD Bobcat CPU (E-350) with integrated GPU and Ubuntu Linux (CL_PLATFORM_VERSION is OpenCL 1.2 AMD-APP (923.1)).

The basic gettimeofday idea looks like this:

clFinish(...)  // that all tasks are finished on the command queue
gettimeofday(&starttime,0x0)
clEnqueueNDRangeKernel(...)
clFlush(...)
clWaitForEvents(...)
gettimeofday(&endtime,0x0)

This says the kernel needs around 5466 ms.

Second time measurement I did with clGetEventProfilingInfo for QUEUED / SUBMIT / START / END.

With the 4 time values I can calculate the time spend in the different states:

I see that it adds up to the 5466 ms, but why does it stay in submitted state for half the time?

And the funny things are:

Does anyone have a clue?

I suspect that either the GPU runs the kernel twice (resulting in gettimeofday being double of the actual execution time) or that the function clGetEventProfilingInfo is not working correctly for the AMD GPU.


Solution

  • I posted the problem in an AMD forum. They say it's a bug in the AMD profiler.

    http://devgurus.amd.com/thread/159809