[SOLVED] Parallel execution of a piece of x64 code

Parallel execution of a piece of x64 code

Consider the following x64 code, produced by clang:

    mov     rax, qword ptr [rsi]                        ; 1
    mov     rdx, qword ptr [rip + hash_mult]            ; 1
    imul    rdx, qword ptr [rax - 8]                    ; 2
    movzx   ecx, byte ptr [rip + hash_shift]            ; 1
    shr     rdx, cl                                     ; 3
    mov     rax, qword ptr [rip + vptrs]                ; 2 - or 3 ???
    mov     rax, qword ptr [rax + 8*rdx]                ; 4
    mov     rcx, qword ptr [rip + slots_strides]        ; 1
    mov     rax, qword ptr [rax + 8*rcx]                ; 2
    jmp     rax                                         ; 5

I have two questions.

Can I assume that the lines marked with the same ordinal can and will be executed in parallel?
What's with the byte ptr used by the movzx instruction? Does it indicate poor alignment?

Solution

Your analysis is almost correct. The instruction mov rax, qword ptr [rip + vptrs] can also execute as early as group 1—register renaming uncouples instructions from the names of the registers they use. Note that the actual order in which the instructions are executed depends on the specific microarchitecture used. Use a tool like uiCA to analyse this. (https://uica.uops.info/)

The byte ptr phrase indicates that the operand is a pointer to a byte, as opposed to a pointer to a quadword as with the other operands.