c assembly arm system-calls inline-assembly

gcc arm optimizes away parameters before System Call

I'm trying to implement some "OSEK-Services" on an arm7tdmi-s using gcc arm. Unfortunately turning up the optimization level results in "wrong" code generation. The main thing I dont understand is that the compiler seems to ignore the procedure call standard, e.g. passing parameters to a function by moving them into registers r0-r3. I understand that function calls can be inlined but still the parameters need to be in the registers to perform the system call.

Consider the following code to demonstrate my problem:

unsigned SysCall(unsigned param)
{
    volatile unsigned ret_val;
    __asm __volatile
    (
        "swi 0          \n\t"    /* perform SystemCall */
        "mov %[v], r0   \n\t"    /* move the result into ret_val */
        : [v]"=r"(ret_val) 
        :: "r0" 
    );

    return ret_val;              /* return the result */
}

int main()
{
    unsigned retCode;
    retCode = SysCall(5); // expect retCode to be 6 when returning back to usermode
}

I wrote the Top-Level software interrupt handler in assembly as follows:

.type   SWIHandler, %function
.global SWIHandler
SWIHandler:

    stmfd   sp! , {r0-r2, lr}        @save regs

    ldr     r0  , [lr, #-4]          @load sysCall instruction and extract sysCall number
    bic     r0  , #0xff000000

    ldr     r3  , =DispatchTable     @load dispatchTable 
    ldr     r3  , [r3, r0, LSL #2]   @load sysCall address into r3 

    ldmia   sp, {r0-r2}              @load parameters into r0-r2
    mov     lr, pc
    bx      r3 

    stmia   sp ,{r0-r2}              @store the result back on the stack
    ldr     lr, [sp, #12]            @restore return address
    ldmfd   sp! , {r0-r2, lr}        @load result into register
    movs    pc  , lr                 @back to next instruction after swi 0

The dispatch table looks like this:

DispatchTable:
    .word activateTaskService
    .word getTaskStateService

The SystemCall function looks like this:

unsigned activateTaskService(unsigned tID)
{
    return tID + 1; /* only for demonstration */
}

running without optimization everything works fine and the parameters are in the registers as to be expected: See following code with -O0 optimization:

00000424 <main>:
 424:   e92d4800    push    {fp, lr}
 428:   e28db004    add fp, sp, #4
 42c:   e24dd008    sub sp, sp, #8
 430:   e3a00005    mov r0, #5          @move param into r0
 434:   ebffffe1    bl  3c0 <SysCall>

000003c0 <SysCall>:
 3c0:   e52db004    push    {fp}        ; (str fp, [sp, #-4]!)
 3c4:   e28db000    add fp, sp, #0
 3c8:   e24dd014    sub sp, sp, #20
 3cc:   e50b0010    str r0, [fp, #-16]
 3d0:   ef000000    svc 0x00000000
 3d4:   e1a02000    mov r2, r0
 3d8:   e50b2008    str r2, [fp, #-8]
 3dc:   e51b3008    ldr r3, [fp, #-8]
 3e0:   e1a00003    mov r0, r3
 3e4:   e24bd000    sub sp, fp, #0
 3e8:   e49db004    pop {fp}        ; (ldr fp, [sp], #4)
 3ec:   e12fff1e    bx  lr

Compiling the same code with -O3 results in the following assembly code:

00000778 <main>:
 778:   e24dd008    sub sp, sp, #8
 77c:   ef000000    svc 0x00000000         @Inline SystemCall without passing params into r0
 780:   e1a02000    mov r2, r0
 784:   e3a00000    mov r0, #0
 788:   e58d2004    str r2, [sp, #4]
 78c:   e59d3004    ldr r3, [sp, #4]
 790:   e28dd008    add sp, sp, #8
 794:   e12fff1e    bx  lr

Notice how the systemCall gets inlined without assigning the value 5 t0 r0.

My first approach is to move those values manually into the registers by adapting the function SysCall from above as follows:

unsigned SysCall(volatile unsigned p1)
{
    volatile unsigned ret_val;
    __asm __volatile
    (
        "mov r0, %[p1]      \n\t"
        "swi 0              \n\t"
        "mov %[v], r0       \n\t" 
        : [v]"=r"(ret_val) 
        : [p1]"r"(p1)
        : "r0"
    );
    return ret_val;
}

It seems to work in this minimal example but Im not very sure whether this is the best possible practice. Why does the compiler think he can omit the parameters when inlining the function? Has somebody any suggestions whether this approach is okay or what should be done differently?

Thank you in advance

Solution

A function call in C source code does not instruct the compiler to call the function according to the ABI. It instructs the compiler to call the function according to the model in the C standard, which means the compiler must pass the arguments to the function in a way of its choosing and execute the function in a way that has the same observable effects as defined in the C standard.

Those observable effects do not include setting any processor registers. When a C compiler inlines a function, it is not required to set any particular processor registers. If it calls a function using an ABI for external calls, then it would have to set registers. Inline calls do not need to obey the ABI.

So merely putting your system request inside a function built of C source code does not guarantee that any registers will be set.

For ARM, what you should do is define register variables assigned to the required register(s) and use those as input and output to the assembly instructions:

unsigned SysCall(unsigned param)
{
    register unsigned Parameter __asm__("r0") = param;
    register unsigned Result    __asm__("r0");
    __asm__ volatile
    (
        "swi 0"
        : "=r" (Result)
        : "r"  (Parameter)
        : // "memory"    // If any inputs are pointers.
    );
    return Result;
}

(This is a major kludge by GCC; it is ugly, and the documentation is poor. But see also https://stackoverflow.com/tags/inline-assembly/info for some links. GCC for some ISAs has convenient specific-register constraints you can use instead of r, but not for ARM.) The register variables do not need to be volatile; the compiler knows they will be used as input and output for the assembly instructions.

The asm statement itself should be volatile if it has side effects other than producing a return value. (For example, getpid() doesn't need to be volatile... unless used before and after fork().)

A non-volatile asm statement with outputs can be optimized away if the output is unused, or hoisted out of loops if its used with the same input (like a pure function call). This is almost never what you want for a system call.

You also need a "memory" clobber if any of the inputs are pointers to memory that the kernel will read or modify. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for more details (and a way to use a dummy memory input or output to avoid a "memory" clobber.)

A "memory" clobber on mmap/munmap or other system calls that affect what memory means would also be wise; you don't want the compiler to decide to do a store after munmap instead of before.