I have a simple function:
extern "C" Variant test_bool(bool arg) {
return arg;
}
Built with mostly standard settings (I removed part of the paths):
zig c++ -target riscv64-linux-musl -I/tests -I/tests/.zig -O3 -DNDEBUG -mcpu=baseline_rv64+rva22u64 -mabi=lp64d -O3 -std=gnu++23 -fno-stack-protector -fno-threadsafe-statics -MD -MT /test_basic.cpp.o -MF /test_basic.cpp.o.d -o /test_basic.cpp.o -c /test_basic.cpp
I need some help to understand why LLVM generates this:
000000000013a56a <test_bool>:
test_bool():
13a56a: 1141 addi sp,sp,-16
13a56c: e406 sd ra,8(sp)
13a56e: e022 sd s0,0(sp)
13a570: 0800 addi s0,sp,16
13a572: 4605 li a2,1
13a574: c110 sw a2,0(a0)
13a576: 00b50423 sb a1,8(a0)
13a57a: 60a2 ld ra,8(sp)
13a57c: 6402 ld s0,0(sp)
13a57e: 0141 addi sp,sp,16
13a580: 8082 ret
When GCC correctly emits this:
0000000000012156 <test_bool>:
test_bool():
12156: 4705 li a4,1
12158: c118 sw a4,0(a0)
1215a: 00b50423 sb a1,8(a0)
1215e: 8082 ret
The Variant constructor is constexpr:
template <typename T>
inline constexpr Variant::Variant(T value)
{
if constexpr (std::is_same_v<T, bool>) {
m_type = BOOL;
v.b = value;
}
else if constexpr (std::is_integral_v<T>) {
...
} ...
The Variant is a simple union with a type:
private:
Type m_type = NIL;
union {
int64_t i;
bool b;
double f;
real_t v4[4];
int32_t v4i[4];
} v;
Any ideas on what could be the cause of the additional instructions?
After tinkering a bit, I discovered that adding -fomit-frame-pointer was enough to make the codegen output the same on each compiler. Looking at the history of this optimization, it looks like there's been a push to make it default disabled on all distros, which is a good thing of course. But in this case I am heavily dependent on reducing instruction usage down to the absolute minimum.