cgccjump-table

Is it faster in C to use a written jump table or switch statement?


So, I am trying to see if there is any difference between using a jump table of function pointers versus a switch statements for performing many, one command operations like these.

This is the code to assembly link i made

Here is my actual code as well

enum code {
    ADD,
    SUB,
    MUL,
    DIV,
    REM
};

typedef struct {
    int val;
} Value;


typedef struct {
    enum code ins;
    int operand;
} Op;


void run(Value* arg, Op* func)
{
   switch(func->ins)
   {
     case ADD: arg->val += func->operand; break;
     case SUB: arg->val -= func->operand; break;
     case MUL: arg->val *= func->operand; break;
     case DIV: arg->val /= func->operand; break;
     case REM: arg->val %= func->operand; break;
   }
}

My question is, based on the generated assembly in that link or the code, would there be any difference from making a bunch of small functions to complete the operations in the cases of the switch statement, and making an array of pointers to those functions and calling them with the same enum?

Using gcc x86_64 7.1

void add(Value* arg, Op* func)
{
   arg->val += func->operand;
}

static void (*jmptable)(Value*, Op*)[] = {
     &add
}

Assembly code paste:

run(Value*, Op*):
        push    rbp
        mov     rbp, rsp
        mov     QWORD PTR [rbp-8], rdi
        mov     QWORD PTR [rbp-16], rsi
        mov     rax, QWORD PTR [rbp-16]
        mov     eax, DWORD PTR [rax]
        cmp     eax, 4
        ja      .L9
        mov     eax, eax
        mov     rax, QWORD PTR .L4[0+rax*8]
        jmp     rax
.L4:
        .quad   .L3
        .quad   .L5
        .quad   .L6
        .quad   .L7
        .quad   .L8
.L3:
        mov     rax, QWORD PTR [rbp-8]
        mov     edx, DWORD PTR [rax]
        mov     rax, QWORD PTR [rbp-16]
        mov     eax, DWORD PTR [rax+4]
        add     edx, eax
        mov     rax, QWORD PTR [rbp-8]
        mov     DWORD PTR [rax], edx
        jmp     .L2
.L5:
        mov     rax, QWORD PTR [rbp-8]
        mov     edx, DWORD PTR [rax]
        mov     rax, QWORD PTR [rbp-16]
        mov     eax, DWORD PTR [rax+4]
        sub     edx, eax
        mov     rax, QWORD PTR [rbp-8]
        mov     DWORD PTR [rax], edx
        jmp     .L2
.L6:
        mov     rax, QWORD PTR [rbp-8]
        mov     edx, DWORD PTR [rax]
        mov     rax, QWORD PTR [rbp-16]
        mov     eax, DWORD PTR [rax+4]
        imul    edx, eax
        mov     rax, QWORD PTR [rbp-8]
        mov     DWORD PTR [rax], edx
        jmp     .L2
.L7:
        mov     rax, QWORD PTR [rbp-8]
        mov     eax, DWORD PTR [rax]
        mov     rdx, QWORD PTR [rbp-16]
        mov     esi, DWORD PTR [rdx+4]
        cdq
        idiv    esi
        mov     edx, eax
        mov     rax, QWORD PTR [rbp-8]
        mov     DWORD PTR [rax], edx
        jmp     .L2
.L8:
        mov     rax, QWORD PTR [rbp-8]
        mov     eax, DWORD PTR [rax]
        mov     rdx, QWORD PTR [rbp-16]
        mov     ecx, DWORD PTR [rdx+4]
        cdq
        idiv    ecx
        mov     rax, QWORD PTR [rbp-8]
        mov     DWORD PTR [rax], edx
        nop
.L2:
.L9:
        nop
        pop     rbp
        ret

Solution

  • A catchall answer to all these questions: you should measure.

    Practically though, I'm betting on the switch version. Function calls have overhead (and they can be hardly inlined in this context), which you could eliminate with labels as values, which is a common compiler extension*, but you should really try all your options and measure if the performance of this piece of code matters to you greatly.

    Otherwise, use whatever's most convenient to you.


    *a switch is likely to generate a jump table equivalent to what you could compose from labels as values but it could switch between different implementations depending on the particular case values and their number