c++ccudagdbgpu

Calculating FLOPS (Floating-point Operations per Seconds)


How can I calculate FLOPS of my application? I get the number of floating-point operations by dividing the total number of executed instructions by the execution time. How do I count then the number of executed instructions?

While my question is general and any answer for another programming language would be highly appreciated, I am looking to find a solution for my application which is developed using C, C++ and CUDA.


Solution

  • What I do if the number of floating point operations is not easily modeled is to produce two executables: One that is the production version and gives me the execution time, and an instrumented one that counts all floating point operations while performing them (surely that will be slow, but that doesn't matter for our purpose). Then I can compute the FLOP/s value by dividing the number of floating point ops from the second executable by the time from the first one.

    This could probably even be automated, but I haven't had a need for this so far.