I know that rdtsc
loads the current value of the processor's time-stamp counter into the two registers: EDX and EAX. In order to get it on x86 I need to do it like that (assuming using Linux):
unsigned long lo, hi;
asm( "rdtsc" : "=a" (lo), "=d" (hi));
return lo;
and for x86_x64:
unsigned long lo, hi;
asm( "rdtsc" : "=a" (lo), "=d" (hi) );
return( lo | (hi << 32) );
why is that? Can anybody explain it to me?
RDTSC always writes its 64-bit result split into hi/lo halves in EDX and EAX, even in 64-bit mode (see the manual), unfortunately not packing the 64-bit TSC into just RAX. That's why extra work is needed after the asm statement.
To make a single 64-bit integer from it, you need to shift hi
to the place it belongs as part of an unsigned long
. lo
is already in the right place, and writing those 32-bit register zeroed the upper bits of both registers, so we can just OR the (shifted) halves together without having to AND the low half.
In x86-64 Linux, unsigned long
is a 64-bit type so the kernel actually uses both halves of the RDTSC return value.
The only reason the 32-bit version is simpler is that the kernel is truncating the result to 32-bit by throwing away the high half. If you do want a 64-bit TSC in 32-bit mode, the same C source works there, too (with uint64_t
or unsigned long long
), although it wouldn't compile to shift and OR instructions. The compiler would just know that it has a 64-bit integer whose halves are in EDX and EAX.
See also How to get the CPU cycle count in x86_64 from C++? - and for real use, don't forget to make these asm volatile
. Otherwise the compiler can assume that repeated executions of this produce the same output, e.g. end-start
= 0 after optimization.