optimizationrustllvm-codegen

Why does this code generate much more assembly than equivalent C++/Clang?


I wrote a simple C++ function in order to check compiler optimization:

bool f1(bool a, bool b) {
    return !a || (a && b);
}

After that I checked the equivalent in Rust:

fn f1(a: bool, b: bool) -> bool {
    !a || (a && b)
}

I used godbolt to check the assembler output.

The result of the C++ code (compiled by clang with -O3 flag) is following:

f1(bool, bool):                                # @f1(bool, bool)
    xor     dil, 1
    or      dil, sil
    mov     eax, edi
    ret

And the result of Rust equivalent is much longer:

example::f1:
  push rbp
  mov rbp, rsp
  mov al, sil
  mov cl, dil
  mov dl, cl
  xor dl, -1
  test dl, 1
  mov byte ptr [rbp - 3], al
  mov byte ptr [rbp - 4], cl
  jne .LBB0_1
  jmp .LBB0_3
.LBB0_1:
  mov byte ptr [rbp - 2], 1
  jmp .LBB0_4
.LBB0_2:
  mov byte ptr [rbp - 2], 0
  jmp .LBB0_4
.LBB0_3:
  mov al, byte ptr [rbp - 4]
  test al, 1
  jne .LBB0_7
  jmp .LBB0_6
.LBB0_4:
  mov al, byte ptr [rbp - 2]
  and al, 1
  movzx eax, al
  pop rbp
  ret
.LBB0_5:
  mov byte ptr [rbp - 1], 1
  jmp .LBB0_8
.LBB0_6:
  mov byte ptr [rbp - 1], 0
  jmp .LBB0_8
.LBB0_7:
  mov al, byte ptr [rbp - 3]
  test al, 1
  jne .LBB0_5
  jmp .LBB0_6
.LBB0_8:
  test byte ptr [rbp - 1], 1
  jne .LBB0_1
  jmp .LBB0_2

I also tried with -O option but the output is empty (deleted unused function).

I intentionally am NOT using any library in order to keep output clean. Please notice that both clang and rustc use LLVM as a backend. What explains this huge output difference? And if it is only disabled-optimize-switch problem, how can I see optimized output from rustc?


Solution

  • Compiling with the compiler flag -O (and with an added pub), I get this output (Link to Godbolt):

    push    rbp
    mov     rbp, rsp
    xor     dil, 1
    or      dil, sil
    mov     eax, edi
    pop     rbp
    ret
    

    A few things: