I'm doing a project where i have to compare various gpu cards for performance analysis.
I had ran the same cuda code for Canny Edge Detection in both GPU's and found that gtx 965 is much faster(200%) than the Tesla K20. Also i observed that Tesla C2075 is running same as that of Tesla K20.
As far as i know K20 has 2496 cores, 965 has 1024 cores and C2075 has 448 cores. K20 and C2075 are NVIDIA Kepler architecture and 965 is Maxwell architecture.
What is it i'm doing wrong or is there any difference in hardware part that is causing this problem?
Also, can we check the power consumed by the graphic card using any program or theoretical calculations?
Many cores do not necessarily mean shorter execution times. If your CUDA app would only be utilizing single thread and you would run your app on:
... then obviously GTX965 can work faster. In theory, as long as you would be utilizing less than 1024 cores by your app, GTX can outperform K20, in case if the memory is not the bottleneck as actually K20 has:
So, to sum up, it is quite easy to "tailor" the CUDA app to suit one GPU better than the others, taking hardware limitations into account. Just take into consideration such simple things as kernel launch parameters, i.e. grid size and block size.
Also, the same goes for C2075 as according to spec, its core clock is 1.15GHz, so superior to both K20 and GTX965.