I'm trying to time a subroutine using rdtscp. This is my procedure:
; Setting up time
rdtscp ; Getting time
push rax ; Saving timestamp
; for(r9=0; r9<LOOP_SIZE; r9++)
mov r9, 0
lup0:
call subr
inc r9
cmp r9, LOOP_SIZE
jnz lup0
; Calculating time taken
pop rbx ; Loading old time
rdtscp ; Getting time
sub rax, rbx ; Calculating difference
if LOOP_SIZE
is small enough, I get consistent and expected results. However, when I make it big enough (around 10^9) I spike from 10^9 to 10^20.
; Result with "LOOP_SIZE equ 100000000"
971597237
; Result with "LOOP_SIZE equ 1000000000"
18446744072281657066
The method that I'm using to display the numbers displays them as unsigned, so I imagine that the large number displayed is actually a negative number and an overflow happened. However, 971597237
is not even close to the 64 bit integer limit, so, assuming that the problem is an overflow, why is it happening?
The problem is that as per documentation, the value of rdtscp
is not stored on rax
, but on edx:eax
(which means that the high bits are on edx
and the low bits on eax
) even on 64 bit mode.
So, if you want to use the full 64-bit value on rax
, you have to move the higher bits from edx
:
; Setting up time
rdtscp ; Getting time
shl rdx, 32 ; Shifting rdx to the correct bit position
add rax, rdx ; Adding both to make timestamp
push rax ; Saving timestamp
; [...stuff...]
; Calculating time taken
rdtscp ; Getting time
pop rbx ; Loading old time (below rdtscp)
shl rdx, 32 ; Shifting rdx to the correct bit position
add rax, rdx ; Adding both to make timestamp
sub rax, rbx ; Calculating difference
Edit: Moved pop rbx
one line down, below rdtscp
. As pointed out by Peter, some registers (rax, rdx and rcx) may be clobbed by rdtscp
. In your example that's not a problem, but if you decided to pop rcx
there instead, then it'd probably get overwritten by rdtscp
, so it's good practice to only pop the stack after it.
Also, you can avoid two calls to the stack by saving the old timestamp in a register that your subroutine doesn't use:
; Setting up time
rdtscp ; Getting time
shl rdx, 32 ; Shifting rdx to the correct bit position
lea r12, [rdx + rax] ; Adding both to make timestamp, and saving it
; [...stuff (that doesn't use r12)...]
; Calculating time taken
rdtscp ; Getting time
shl rdx, 32 ; Shifting rdx to the correct bit position
add rax, rdx ; Adding both to make timestamp
sub rax, r12 ; Calculating difference