cgccatomicriscvspinlock

Why __sync_or_and_fetch builtin in a loop renders as an endless loop with -O2 and -O3?


[SOLVED] Use __sync_fetch_and_or instead of __sync_or_and_fetch! The double amoor instruction maybe is still a bug.

[Disclaimer] This could be a bug in GCC but I am still new to RISC-V assembly to be sure!

I am trying to implement a "fast" spinlock function in RISC-V (64bit FWIW) with mixed C and asm() by using cross compilation with GCC v14.2.0 and GCC 15.1.0. This (very stripped down) code:

#define LOCK_BIT ((unsigned long)(1ul<<(8*sizeof(long)-1)))

void lock_hold(long* lock) {
  while(__sync_or_and_fetch(lock,LOCK_BIT) < 0);
  __sync_synchronize();
}

Is rendered by both versions with either -O3 and -O2 as:

lock_hold:
    li  a5,-1
    slli    a5,a5,63
.L2:
    amoor.d.aqrl    a4,a5,0(a0)
    amoor.d.aqrl    a4,a5,0(a0)
    j   .L2

which seems to be an infinite loop with 2 consecutive identical amoor instructions.

First of all, I am not expecting an infinite loop there!

If I switch to -O1, I get this code:

lock_hold:
    li  a4,-1
    slli    a4,a4,63
.L2:
    amoor.d.aqrl    a5,a4,0(a0)
    or  a5,a5,a4.   #USELESS?
    blt a5,zero,.L2
    fence   rw,rw
    ret

which looks more like what I would expect, while -Os produces this code:

lock_hold:
    li  a5,-1
    slli    a5,a5,63
.L2:
    amoor.d.aqrl    a4,a5,0(a0)
    j   .L2

again with an infinite loop.

Finally, using -O0 I get basically the same as with -O1 with some extra instructions.

Have I hit a bug or am I missing something? In case, what am I missing?

On top of that, I would like to get also an answer from "someone who easily knows much more than me".

In the code produced by -O1 I have labelled an or instruction as #USELESS?. Do I really need to perform some operation on a5 in order to set the "sign flag" after an amoor instruction that writes on a5 itself?

In case, wouldn't something like or a5,zero,a5 be enough?


Solution

  • __sync_or_and_fetch(lock,LOCK_BIT) sets the highest bit in the long - the sign bit. Since you use __sync_or_and_fetch, and not __sync_fetch_and_or you receive the new value that is saved in lock, after setting the bit.

    So therefore, since you always set the sign bit and check the value with the sign bit set, your check for the value of lock after this operation being negative will always be true. That is why you end up with an infinite loop.