coptimization

Can the compiler optimize this snippet of code?


Consider the following snippet of code:

for(i = 0; i<10; i++)
{
    int n = a[i];//first loop statement

    //other statements
}

Clearly, the complier will not hoist the first statement out of the loop. But would a compiler be able to hoist only the declaration of n above the loop? In other words, can a compiler optimize the above code too:

int n;

for(i = 0; i < 10; i++)
{
    n = a[i];//first loop statement
}

Solution

  • Actually, most compilers will do this even at -O0:

    ~ $ cat t.c
    volatile int v;
    
    int a[10];
    
    void f(void)
    {
      int n;
      int i;
      for(i = 0; i < 10; i++) {
    
        n = a[i];
        v = n;
      }
    }
    ~ $ clang -S -O0 t.c
    ~ $ cat t.s
    …
    _f:                                     ## @f
        .cfi_startproc
    ## BB#0:
        pushq   %rbp
    Ltmp2:
        .cfi_def_cfa_offset 16
    Ltmp3:
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
    Ltmp4:
        .cfi_def_cfa_register %rbp
        movl    $0, -8(%rbp)
    LBB0_1:                                 ## =>This Inner Loop Header: Depth=1
        cmpl    $10, -8(%rbp)
        jge LBB0_4
    ## BB#2:                                ##   in Loop: Header=BB0_1 Depth=1
        movq    _v@GOTPCREL(%rip), %rax
        movq    _a@GOTPCREL(%rip), %rcx
        movslq  -8(%rbp), %rdx
        movl    (%rcx,%rdx,4), %esi
        movl    %esi, -4(%rbp)
        movl    -4(%rbp), %esi
        movl    %esi, (%rax)
    ## BB#3:                                ##   in Loop: Header=BB0_1 Depth=1
        movl    -8(%rbp), %eax
        addl    $1, %eax
        movl    %eax, -8(%rbp)
        jmp LBB0_1
    LBB0_4:
        popq    %rbp
        ret
    …
    ~ $ 
    

    Note how, above, there are no instructions inside the body of the loop to reserve n. Instead the same stack slot -4(%rbp) is seamlessly reused. If I compiled with the slightest level of optimization, there wouldn't even be a stack slot for n: a register would be enough to hold its value for the short time span it has:

    ~ $ clang -S -O1 t.c
    ~ $ cat t.s
    …
    _f:                                     ## @f
        .cfi_startproc
    ## BB#0:
        pushq   %rbp
    Ltmp2:
        .cfi_def_cfa_offset 16
    Ltmp3:
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
    Ltmp4:
        .cfi_def_cfa_register %rbp
        xorl    %eax, %eax
        movq    _a@GOTPCREL(%rip), %rcx
        movq    _v@GOTPCREL(%rip), %rdx
        .align  4, 0x90
    LBB0_1:                                 ## =>This Inner Loop Header: Depth=1
        movl    (%rcx,%rax,4), %esi
        movl    %esi, (%rdx)
        incq    %rax
        cmpq    $10, %rax
        jne LBB0_1
    ## BB#2:
        popq    %rbp
        ret
    

    In this new compiled version, %esi is n.


    The way compilers achieve the “lifting variable declaration outside of loop” optimization even at the lowest level of optimization is by lifting the declaration of all block-scope automatic variables to function scope. There is absolutely nothing to it. Also no discussion of compiler optimization makes much sense without minimal understanding of the target language, in which a variable declaration needs not result in any code.