cassemblycompiler-optimizationsetjmp

Special treatment of setjmp/longjmp by compilers


In Why volatile works for setjmp/longjmp, user greggo comments:

Actually modern C compilers do need to know that setjmp is a special case, since there are, in general, optimizations where the change of flow caused by setjmp could badly corrupt things, and these need to be avoided. Back in K&R days, setjmp did not need special handling, and didn't get any, and so the caveat about locals applied. Since that caveat is already there and (should be!) understood - and of course, setjmp use is pretty rare - there is no incentive for modern compilers to go to any extra lengths to fix the 'clobber' issue -- it would still be in the language.

Are there any references that elaborate on this and if this is true, can there safely exist (with behavior no more error-prone than that of standard setjmp/longjmp) custom-made implementations of setjmp/longjmp (e.g., maybe I'd like to save some extra (thread-local) context) that are named something different? Like is there anyway to tell compilers "this function is effectively setjmp/longjmp"?


Solution

  • The C language defines setjmp to be a macro and places strict limitations on context in which it may appear without invoking undefined behavior. It is not a normal function: you cannot take its address and expect a call via the resulting pointer to behave as a proper setjmp invocation.

    In particular, it is not true in general that assembly code invoked by setjmp obeys the same calling conventions as normal functions. SPARC on Linux and Solaris provides a counterexample: its setjmp does not restore all call-preserved registers (nor does vfork). It took GCC by surprise as recently as 2018 (gcc-patches thread, bugzilla entry).

    But even considering "compiler-friendly" platforms where setjmp entrypoint obeys the usual conventions, it is still necessary to recognize it as a function that "returns twice". GCC recognizes setjmp-like functions (including vfork) by name, and offers __attribute__((returns_twice)) for annotating such functions in custom code.

    The reason for that is longjmp'ing back to setjmp can transfer control from a point where some variable or temporary appears dead (and the compiler reused its storage for something unrelated) back to where it was live (but its storage is "clobbered" now, oops).

    Constructing an example that demonstrates how this happens is a bit tricky: the clobbered storage cannot be a register, because if it's call-clobbered it wouldn't be in use at the point of setjmp, and if it is call-saved longjmp would restore it (SPARC exception aside). So it needs to be forced to stack without making addresses of both variables exposed in a way that makes their lifetimes overlap, preventing reuse of stack slots, and without making one of them go out of scope before longjmp.

    With a bit of luck I managed to arrive at the following testcase, which when compiled with -O2 -mtune-ctrl=^inter_unit_moves_from_vec (view on Compiler Explorer):

    //__attribute__((returns_twice))
    int my_setjmp(void);
    
    __attribute__((noreturn))
    void my_longjmp(int);
    
    static inline
    int float_as_int(float x)
    {
        return (union{float f; int i;}){x}.i;
    }
    
    float f(void);
    
    int g(void)
    {
        int ret = float_as_int(f());
    
        if (__builtin_expect(my_setjmp(), 1)) {
            int tmp = float_as_int(f());
            my_longjmp(tmp);
        }
        return ret;
    }
    

    produces the following assembly:

    g:
            sub     rsp, 24
            call    f
            movss   DWORD PTR [rsp+12], xmm0
            call    my_setjmp
            test    eax, eax
            je      .L2
            call    f
            movss   DWORD PTR [rsp+12], xmm0
            mov     edi, DWORD PTR [rsp+12]
            call    my_longjmp
    .L2:
            mov     eax, DWORD PTR [rsp+12]
            add     rsp, 24
            ret
    

    The -mtune-ctrl=^inter_unit_moves_from_vec flag causes GCC to implement SSE-to-gpr moves via stack, and both moves use the same stack slot, because as far as the compiler can tell, there's no conflict (computing 'tmp' leads to a noreturn function, so temporary used for computing 'ret' is no longer needed). However, if my_longjmp transfers control back to my_setjmp, after branching to label .L2 we try to read the value of 'ret' from the overwritten slot.