This is an example code from Computer Systems: A Programmer's Perspective.
short foo(short a, short b) {
short result;
result = b;
while(b > 0) {
result *= a;
b -= a;
}
return result;
}
Compiling this with gcc14.2
using -O1
and -O3
gives the following assembly (godbolt)
O1
foo:
movl %esi, %eax
movl %esi, %ecx
testw %si, %si
jle .L1
.L3:
imull %edi, %eax
subl %edi, %ecx
testw %cx, %cx
jg .L3
.L1:
ret
O3
foo:
movl %esi, %eax
movl %esi, %edx
testw %si, %si
jle .L1
.L3:
subl %edi, %edx
imull %edi, %eax
testw %dx, %dx
jg .L3
.L1:
ret
With O3
, sub
comes before imul
- why is this reordering helpful?
Just as you note, subl
and imul
can be executed in parallel (neither depends on the other), but testw
does depend on subl
's result (and theoretically, whether or not to use the result of the next imul
depends on testw
) As such, you want to complete, and therefore start, subl
as soon as possible so that testw
can start whether or not imul
has completed.