cperformanceclockrdtsc

What is the most reliable way to measure the number of cycles of my program in C?


I am familiar with two approaches, but both of them have their limitations.

The first one is to use the instruction RDTSC. However, the problem is that it doesn't count the number of cycles of my program in isolation and is therefore sensitive to noise due to concurrent processes.

The second option is to use the clock library function. I thought that this approach is reliable, since I expected it to count the number of cycles for my program only (what I intend to achieve). However, it turns out that in my case it measures the elapsed time and then multiplies it by CLOCKS_PER_SEC. This is not only unreliable, but also wrong, since CLOCKS_PER_SEC is set to 1,000,000 which does not correspond to the actual frequency of my processor.

Given the limitation of the proposed approaches, is there a better and more reliable alternative to produce consistent results?


Solution

  • A lot here depends on how large an amount of time you're trying to measure.

    RDTSC can be (almost) 100% reliable when used correctly. It is, however, of use primarily for measuring truly microscopic pieces of code. If you want to measure two sequences of, say, a few dozen or so instructions apiece, there's probably nothing else that can do the job nearly as well.

    Using it correctly is somewhat challenging though. Generally speaking, to get good measurements you want to do at least the following:

    1. Set the code to only run on one specific core.
    2. Set the code to execute at maximum priority so nothing preempts it.
    3. Use CPUID liberally to ensure serialization where needed.

    If, on the other hand, you're trying to measure something that takes anywhere from, say, 100 ms on up, RDTSC is pointless. It's like trying to measure the distance between cities with a micrometer. For this, it's generally best to assure that the code in question takes (at least) the better part of a second or so. clock isn't particularly precise, but for a length of time on this general order, the fact that it might only be accurate to, say, 10 ms or so, is more or less irrelevant.