performancex86x86-64cpu-architecturecpu-cycles

Why isn't RDTSC a serializing instruction?


The Intel manuals for the RDTSC instruction warn that out of order execution can change when RDTSC is actually executed, so they recommend inserting a CPUID instruction in front of it because CPUID will serialize the instruction stream (CPUID is never executed out of order). My question is simple: if they had the ability to make instructions serializing, why didn't they make RDTSC serializing? The entire point of it appears to be to get cycle accurate timings. Is there a situation under which you would not want to precede it with a serializing instruction?

Newer Intel CPUs have a separate RDTSCP instruction that is serializing. Intel opted to introduce a separate instruction rather than change the behavior of RDTSC, which suggests to me that there has to be some situation where a potentially out of order timing is what you want. What is it?


Solution

  • If you are trying to use rdtsc to see if a branch mispredicts, the non-serializing version is what you want.

    //math here
    rdtsc
    branch if zero to done
    //do some work that always takes 1 cycle
    done: rdtsc
    

    If the branch is predicted correctly, the delta will be small (maybe even negative?). If the branch is mispredicted, the delta will be large.

    With the serializing version, the branch condition will be resolved because the first rdtsc waits for the math to finish.