[SOLVED] Use __sync_fetch_and_or
instead of __sync_or_and_fetch
! The double amoor
instruction maybe is still a bug.
[Disclaimer] This could be a bug in GCC but I am still new to RISC-V assembly to be sure!
I am trying to implement a "fast" spinlock function in RISC-V (64bit FWIW) with mixed C and asm()
by using cross compilation with GCC v14.2.0 and GCC 15.1.0.
This (very stripped down) code:
#define LOCK_BIT ((unsigned long)(1ul<<(8*sizeof(long)-1)))
void lock_hold(long* lock) {
while(__sync_or_and_fetch(lock,LOCK_BIT) < 0);
__sync_synchronize();
}
Is rendered by both versions with either -O3
and -O2
as:
lock_hold:
li a5,-1
slli a5,a5,63
.L2:
amoor.d.aqrl a4,a5,0(a0)
amoor.d.aqrl a4,a5,0(a0)
j .L2
which seems to be an infinite loop with 2 consecutive identical amoor
instructions.
First of all, I am not expecting an infinite loop there!
If I switch to -O1
, I get this code:
lock_hold:
li a4,-1
slli a4,a4,63
.L2:
amoor.d.aqrl a5,a4,0(a0)
or a5,a5,a4. #USELESS?
blt a5,zero,.L2
fence rw,rw
ret
which looks more like what I would expect, while -Os
produces this code:
lock_hold:
li a5,-1
slli a5,a5,63
.L2:
amoor.d.aqrl a4,a5,0(a0)
j .L2
again with an infinite loop.
Finally, using -O0
I get basically the same as with -O1
with some extra instructions.
Have I hit a bug or am I missing something? In case, what am I missing?
On top of that, I would like to get also an answer from "someone who easily knows much more than me".
In the code produced by -O1
I have labelled an or
instruction as #USELESS?
.
Do I really need to perform some operation on a5
in order to set the "sign flag" after an amoor
instruction that writes on a5
itself?
In case, wouldn't something like or a5,zero,a5
be enough?
__sync_or_and_fetch(lock,LOCK_BIT)
sets the highest bit in the long - the sign bit. Since you use __sync_or_and_fetch
, and not __sync_fetch_and_or
you receive the new value that is saved in lock
, after setting the bit.
So therefore, since you always set the sign bit and check the value with the sign bit set, your check for the value of lock
after this operation being negative will always be true. That is why you end up with an infinite loop.