cassemblyswitch-statementx86-64gcc4.4

GCC 4.4: Avoid range check on switch/case statement in gcc?


This is only an issue on GCC versions prior to 4.4, this was fixed in GCC 4.5.

Is it possible to tell the compiler the variable used in a switch fits within the provided case statements? In particular if it's a small range and there's a jump table generated.

extern int a;
main()
{
        switch (a & 0x7) {   // 0x7  == 111  values are 0-7
        case 0: f0(); break;
        case 1: f1(); break;
        case 2: f2(); break;
        case 3: f3(); break;
        case 4: f4(); break;
        case 5: f5(); break;
        case 6: f6(); break;
        case 7: f7(); break;
        }
}

I tried xor'ing to low bits (as the example), using enums, using gcc_unreachable() to no avail. The generated code always checks if the variable is inside the range, adding a pointless branch conditional and moving away the jump table calculation code.

Note: this is in the innermost loop of a decoder, performance matters significantly.

It seems I'm not the only one.

There is no way to tell gcc that the default branch is never taken, although it will omit the default branch if it can prove that the value is never out of range based on earlier conditional checks.

So, how do you help gcc prove the variable fits and there's no default branch in the example above? (Without adding a conditional branch, of course.)

Updates

  1. This was on OS X 10.6 Snow Leopard with GCC 4.2 (default from Xcode.) It didn't happen with GCC 4.4/4.3 in linux (reported by Nathon and Jens Gustedt.)

  2. The functions in the example are there for readability, think those are inlined or just statements. Making a function call on x86 is expensive.

    Also the example, as mentioned in the note, belongs inside a loop on data (big data.)

    The generated code with gcc 4.2/OS X is:

    [...]
    andl    $7, %eax
    cmpl    $7, %eax
    ja  L11
    mov %eax, %eax
    leaq    L20(%rip), %rdx
    movslq  (%rdx,%rax,4),%rax
    addq    %rdx, %rax
    jmp *%rax
    .align 2,0x90
    L20:
    .long   L12-L20
    .long   L13-L20
    .long   L14-L20
    .long   L15-L20
    .long   L16-L20
    .long   L17-L20
    .long   L18-L20
    .long   L19-L20
    L19:
    [...]
    

    The problem lies on cmp $7, %eax; ja L11;

  3. OK, I'm going with the ugly solution and adding a special case for gcc versions below 4.4 using a different version without a switch and using goto and gcc's &&label extensions.

    static void *jtb[] = { &&c_1, &&c_2, &&c_3, &&c_4, &&c_5, &&c_6, &&c_7, &&c_8 };
    [...]
    goto *jtb[a & 0x7];
    [...]
    while(0) {
    c_1:
    // something
    break;
    c_2:
    // something
    break;
    [...]
    }
    

    Note the array of labels is static so it's not computed every call.


Solution

  • I tried compiling something simple and comparable with -O5 and -fno-inline (my f0-f7 functions were trivial) and it generated this:

    
     8048420:   55                      push   %ebp ;; function preamble
     8048421:   89 e5                   mov    %esp,%ebp ;; Yeah, yeah, it's a function.
     8048423:   83 ec 04                sub    $0x4,%esp ;; do stuff with the stack
     8048426:   8b 45 08                mov    0x8(%ebp),%eax ;; x86 sucks, we get it
     8048429:   83 e0 07                and    $0x7,%eax ;; Do the (a & 0x7)
     804842c:   ff 24 85 a0 85 04 08    jmp    *0x80485a0(,%eax,4) ;; Jump table!
     8048433:   90                      nop
     8048434:   8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
     8048438:   8d 45 08                lea    0x8(%ebp),%eax
     804843b:   89 04 24                mov    %eax,(%esp)
     804843e:   e8 bd ff ff ff          call   8048400 
     8048443:   8b 45 08                mov    0x8(%ebp),%eax
     8048446:   c9                      leave  
    

    Did you try playing with optimization levels?