ccpu-registersrdtsc

Why should I use 'rdtsc' differently on x86 and x86_x64?


I know that rdtsc loads the current value of the processor's time-stamp counter into the two registers: EDX and EAX. In order to get it on x86 I need to do it like that (assuming using Linux):

    unsigned long lo, hi;
    asm( "rdtsc" : "=a" (lo), "=d" (hi));
    return lo;

and for x86_x64:

        unsigned long lo, hi;
        asm( "rdtsc" : "=a" (lo), "=d" (hi) ); 
        return( lo | (hi << 32) );

why is that? Can anybody explain it to me?


Solution

  • RDTSC always writes its 64-bit result split into hi/lo halves in EDX and EAX, even in 64-bit mode (see the manual), unfortunately not packing the 64-bit TSC into just RAX. That's why extra work is needed after the asm statement.

    To make a single 64-bit integer from it, you need to shift hi to the place it belongs as part of an unsigned long. lo is already in the right place, and writing those 32-bit register zeroed the upper bits of both registers, so we can just OR the (shifted) halves together without having to AND the low half.

    In x86-64 Linux, unsigned long is a 64-bit type so the kernel actually uses both halves of the RDTSC return value.

    The only reason the 32-bit version is simpler is that the kernel is truncating the result to 32-bit by throwing away the high half. If you do want a 64-bit TSC in 32-bit mode, the same C source works there, too (with uint64_t or unsigned long long), although it wouldn't compile to shift and OR instructions. The compiler would just know that it has a 64-bit integer whose halves are in EDX and EAX.

    See also How to get the CPU cycle count in x86_64 from C++? - and for real use, don't forget to make these asm volatile. Otherwise the compiler can assume that repeated executions of this produce the same output, e.g. end-start = 0 after optimization.