optimizationparallel-processingopenclgpgpucloo

OpenCL Choosing Optimal Device for Throughput


I am working with Cloo, an OpenCL C# library, and I was wondering how I can best determine which device to use for my kernels at runtime. What I really want to know is how many cores I have (compute units * cores per compute unit) on GPUs. How do I do this properly? I currently can determine compute units and frequency.

EDIT: I have considered trying to profile (run a speed test) on all devices and save/compare the results. But, from my understanding this poses a problem as well because you can't write a program that optimally/fairly uses all devices for comparison.

This would also be useful to choose an optimal number of worker threads to specify for every kernel call. Any help is greatly appreciated.


Solution

  • Judgement of performance by just core count is very hard. Some cores are wider, some are quicker. Even if they are same, different register space / local memory combinations make it even more difficult to guess.

    Either you should have a database of each graphics card performance per driver per OS per algorithm and multiply them with current frequency or should simply benchmark them before selection or query performance timers of all devices while they are doing actual acceleration job.

    A GTX680 and a HD7950 have similar number of cores but some algorithms favor HD7950 for extra %200 performance and opposite for some other codes.

    You cannot query number of cores. You can query number of compute units and maximum number of threads per compute unit but they are not related to performance unless they are of same architecture.

    You can query optimal thread number per work group but that can change with algorithm you use so you should try as many values as possible. Same for vectorized versions of a scalar function. If it is a cpu(or any vliw gpu) it can multiply 4 or 8 numbers at the same time.

    Sometimes drivers' auto compiler optimization is as good as a hand tuned optimization.

    https://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clGetDeviceInfo.html