assemblyx86-64rdtsc

x86_64 - Why is timing a program with rdtsc/rdtscp giving unreasonably large numbers?


I'm trying to time a subroutine using rdtscp. This is my procedure:

; Setting up time
rdtscp                      ; Getting time
push rax                    ; Saving timestamp

; for(r9=0; r9<LOOP_SIZE; r9++)
mov r9, 0
lup0:
call subr
inc r9
cmp r9, LOOP_SIZE
jnz lup0

; Calculating time taken
pop rbx                     ; Loading old time
rdtscp                      ; Getting time
sub rax, rbx                ; Calculating difference

if LOOP_SIZE is small enough, I get consistent and expected results. However, when I make it big enough (around 10^9) I spike from 10^9 to 10^20.

; Result with "LOOP_SIZE equ 100000000"
971597237
; Result with "LOOP_SIZE equ 1000000000"
18446744072281657066

The method that I'm using to display the numbers displays them as unsigned, so I imagine that the large number displayed is actually a negative number and an overflow happened. However, 971597237 is not even close to the 64 bit integer limit, so, assuming that the problem is an overflow, why is it happening?


Solution

  • The problem is that as per documentation, the value of rdtscp is not stored on rax, but on edx:eax (which means that the high bits are on edx and the low bits on eax) even on 64 bit mode.

    So, if you want to use the full 64-bit value on rax, you have to move the higher bits from edx:

    ; Setting up time
    rdtscp                      ; Getting time
    shl rdx, 32                 ; Shifting rdx to the correct bit position
    add rax, rdx                ; Adding both to make timestamp
    push rax                    ; Saving timestamp
    
    ; [...stuff...]
    
    ; Calculating time taken
    rdtscp                      ; Getting time
    pop rbx                     ; Loading old time (below rdtscp)
    shl rdx, 32                 ; Shifting rdx to the correct bit position
    add rax, rdx                ; Adding both to make timestamp
    sub rax, rbx                ; Calculating difference
    

    Edit: Moved pop rbx one line down, below rdtscp. As pointed out by Peter, some registers (rax, rdx and rcx) may be clobbed by rdtscp. In your example that's not a problem, but if you decided to pop rcx there instead, then it'd probably get overwritten by rdtscp, so it's good practice to only pop the stack after it.


    Also, you can avoid two calls to the stack by saving the old timestamp in a register that your subroutine doesn't use:

    ; Setting up time
    rdtscp                      ; Getting time
    shl rdx, 32                 ; Shifting rdx to the correct bit position
    lea r12, [rdx + rax]        ; Adding both to make timestamp, and saving it
    
    ; [...stuff (that doesn't use r12)...]
    
    ; Calculating time taken
    rdtscp                      ; Getting time
    shl rdx, 32                 ; Shifting rdx to the correct bit position
    add rax, rdx                ; Adding both to make timestamp
    sub rax, r12                ; Calculating difference