Consider the following snippet of code:
for(i = 0; i<10; i++)
{
int n = a[i];//first loop statement
//other statements
}
Clearly, the complier will not hoist the first statement out of the loop. But would a compiler be able to hoist only the declaration of n above the loop? In other words, can a compiler optimize the above code too:
int n;
for(i = 0; i < 10; i++)
{
n = a[i];//first loop statement
}
Actually, most compilers will do this even at -O0
:
~ $ cat t.c
volatile int v;
int a[10];
void f(void)
{
int n;
int i;
for(i = 0; i < 10; i++) {
n = a[i];
v = n;
}
}
~ $ clang -S -O0 t.c
~ $ cat t.s
…
_f: ## @f
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
movl $0, -8(%rbp)
LBB0_1: ## =>This Inner Loop Header: Depth=1
cmpl $10, -8(%rbp)
jge LBB0_4
## BB#2: ## in Loop: Header=BB0_1 Depth=1
movq _v@GOTPCREL(%rip), %rax
movq _a@GOTPCREL(%rip), %rcx
movslq -8(%rbp), %rdx
movl (%rcx,%rdx,4), %esi
movl %esi, -4(%rbp)
movl -4(%rbp), %esi
movl %esi, (%rax)
## BB#3: ## in Loop: Header=BB0_1 Depth=1
movl -8(%rbp), %eax
addl $1, %eax
movl %eax, -8(%rbp)
jmp LBB0_1
LBB0_4:
popq %rbp
ret
…
~ $
Note how, above, there are no instructions inside the body of the loop to reserve n
. Instead the same stack slot -4(%rbp)
is seamlessly reused. If I compiled with the slightest level of optimization, there wouldn't even be a stack slot for n
: a register would be enough to hold its value for the short time span it has:
~ $ clang -S -O1 t.c
~ $ cat t.s
…
_f: ## @f
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
xorl %eax, %eax
movq _a@GOTPCREL(%rip), %rcx
movq _v@GOTPCREL(%rip), %rdx
.align 4, 0x90
LBB0_1: ## =>This Inner Loop Header: Depth=1
movl (%rcx,%rax,4), %esi
movl %esi, (%rdx)
incq %rax
cmpq $10, %rax
jne LBB0_1
## BB#2:
popq %rbp
ret
In this new compiled version, %esi
is n
.
The way compilers achieve the “lifting variable declaration outside of loop” optimization even at the lowest level of optimization is by lifting the declaration of all block-scope automatic variables to function scope. There is absolutely nothing to it. Also no discussion of compiler optimization makes much sense without minimal understanding of the target language, in which a variable declaration needs not result in any code.