cif-statementassemblyoptimizationcomparison

In what conditions is one comparison for "if" and "if-else" at С as at assembly?


For x86 assembly the "cmp" instruction sets two flags: "ZF" and "CF", allowing to determine if two integers are equal or greater or less with a single comparison. How should the code be written in C to perform only one comparison for all three cases? 6 options are possible:

if (x > y) { /*code1*/ } else if (x < y) { /*code2*/ } else { /*code3*/ }

if (x < y) { /*code2*/ } else if (x > y) { /*code1*/ } else { /*code3*/ }

if (x > y) { /*code1*/ } else if (x == y) { /*code3*/ } else { /*code2*/ }

if (x < y) { /*code2*/ } else if (x == y) { /*code3*/ } else { /*code1*/ }

if (x == y) { /*code3*/ } else if (x < y) { /*code2*/ } else { /*code1*/ }

if (x == y) { /*code3*/ } else if (x > y) { /*code1*/ } else { /*code2*/ } 

Solution

  • Generally, what we do in C is that we code a regular, naive, seemingly inefficient if/else-if/else statement, and we expect the compiler to optimize it.

    So, if both x and y can be known by the compiler to be simple values that do not require re-evaluation, we can code the following construct in C:

    if( x > y )
        { /* code1 */ }
    else if( x < y )
        { /* code2 */ }
    else /* x == y */
        { /* code3 */ }
    

    and the generated optimized assembly should look more or less like the following:

        mov eax, [x]
        cmp eax, [y]
        jg code1
        jl code2
        /* code3 */
        jmp after
    code1:
        /* code1 */
        jmp after
    code2:
        /* code2 */
    after:
    

    Note that in the naive C code the variables x and y are accessed twice each, and compared twice, whereas in the optimized assembly code they are loaded and compared only once.

    Here is the source code and the generated assembly on godbolt:

    https://godbolt.org/z/8YxT5Kh7P

    The instructions of interest are the following:

        cmp     eax, ebp
        jg      .L7
        jl      .L8
    

    (Here, eax contains y, and ebp contains x.)


    The technique described above applies to all 6 cases listed in the question, as long as x and y are simple values.

    If x or y require re-evaluation, (for example, if they are function calls,) then we need a slightly different technique:

    int xx = x(), yy = y();
    if( xx > yy )
        { /* code1 */ }
    else if( xx < yy )
        { /* code2 */ }
    else /* xx == yy, meaning that x() == y() */
        { /* code3 */ }
    

    Note that the variables xx and yy are likely to be completely optimized away by the compiler, resulting in optimized assembly code very similar to what was shown above.


    These were examples of the widespread and well-established practice of writing naive constructs in C code kind of expecting the compiler to optimize them in a certain way. However, in many cases the compiler decides to do different things that we may not have expected.

    So, if you get into the habit of checking whether the compiler did in fact do exactly as you expected it to do, be prepared to sometimes be surprised.