I am using the JOCL library (by JOGAMP) and I was wondering if it was possible to measure the time it took to transfer data from host to device, the time the kernel took and the time it took to get the results back separately?
Currently I am invoking my kernels like this:
queue.putReadBuffer(...).put1DKernel(...).putWriteBuffer(...)
To answer my own question ;-) The procedure goes like this...first create a CLEventList with the desired capacity, since I only want to measure kernel execution I set this to 1.
CLEventList list = new CLEventList(1);
Now when you set your kernel into the command queue you add the list as a argument:
queue.putReadBuffer(...).put1DKernel(..., list).putWriteBuffer(...).finish();
Afterwards you can get the timing by calling:
long start = list.getEvent(0).getProfilingInfo(ProfilingCommand.START);
long end = list.getEvent(0).getProfilingInfo(ProfilingCommand.END);
long duration = end - start // time in nanoseconds
Don't forget to initialize your command queue with Mode.PROFILING_MODE enabled.