Consider the following x64 code, produced by clang:
mov rax, qword ptr [rsi] ; 1
mov rdx, qword ptr [rip + hash_mult] ; 1
imul rdx, qword ptr [rax - 8] ; 2
movzx ecx, byte ptr [rip + hash_shift] ; 1
shr rdx, cl ; 3
mov rax, qword ptr [rip + vptrs] ; 2 - or 3 ???
mov rax, qword ptr [rax + 8*rdx] ; 4
mov rcx, qword ptr [rip + slots_strides] ; 1
mov rax, qword ptr [rax + 8*rcx] ; 2
jmp rax ; 5
I have two questions.
Can I assume that the lines marked with the same ordinal can and will be executed in parallel?
What's with the byte ptr
used by the movzx
instruction? Does it indicate poor alignment?
Your analysis is almost correct. The instruction mov rax, qword ptr [rip + vptrs]
can also execute as early as group 1—register renaming uncouples instructions from the names of the registers they use.
Note that the actual order in which the instructions are executed depends on the specific microarchitecture used. Use a tool like uiCA to analyse this. (https://uica.uops.info/)
The byte ptr
phrase indicates that the operand is a pointer to a byte, as opposed to a pointer to a quadword as with the other operands.