c++llvmriscv

LLVM generates stack usage on simple RISC-V function where GCC doesn't


I have a simple function:

extern "C" Variant test_bool(bool arg) {
    return arg;
}

Built with mostly standard settings (I removed part of the paths):

zig c++ -target riscv64-linux-musl -I/tests -I/tests/.zig -O3 -DNDEBUG -mcpu=baseline_rv64+rva22u64 -mabi=lp64d -O3 -std=gnu++23 -fno-stack-protector -fno-threadsafe-statics -MD -MT /test_basic.cpp.o -MF /test_basic.cpp.o.d -o /test_basic.cpp.o -c /test_basic.cpp

I need some help to understand why LLVM generates this:

000000000013a56a <test_bool>:
test_bool():
  13a56a:       1141                    addi    sp,sp,-16
  13a56c:       e406                    sd      ra,8(sp)
  13a56e:       e022                    sd      s0,0(sp)
  13a570:       0800                    addi    s0,sp,16
  13a572:       4605                    li      a2,1
  13a574:       c110                    sw      a2,0(a0)
  13a576:       00b50423                sb      a1,8(a0)
  13a57a:       60a2                    ld      ra,8(sp)
  13a57c:       6402                    ld      s0,0(sp)
  13a57e:       0141                    addi    sp,sp,16
  13a580:       8082                    ret

When GCC correctly emits this:

0000000000012156 <test_bool>:
test_bool():
   12156:       4705                    li      a4,1
   12158:       c118                    sw      a4,0(a0)
   1215a:       00b50423                sb      a1,8(a0)
   1215e:       8082                    ret

The Variant constructor is constexpr:

template <typename T>
inline constexpr Variant::Variant(T value)
{
    if constexpr (std::is_same_v<T, bool>) {
        m_type = BOOL;
        v.b = value;
    }
    else if constexpr (std::is_integral_v<T>) {
       ...
    } ...

The Variant is a simple union with a type:

private:
    Type m_type = NIL;
    union {
        int64_t  i;
        bool     b;
        double   f;
        real_t   v4[4];
        int32_t  v4i[4];
    } v;

Any ideas on what could be the cause of the additional instructions?


Solution

  • After tinkering a bit, I discovered that adding -fomit-frame-pointer was enough to make the codegen output the same on each compiler. Looking at the history of this optimization, it looks like there's been a push to make it default disabled on all distros, which is a good thing of course. But in this case I am heavily dependent on reducing instruction usage down to the absolute minimum.