In the RISC-V unprivileged manual, it is written that there is this pseudoinstruction called li
:
li rd, immediate | Myriad sequences | Load immediate
But it only says that the base instruction is Myriad sequences
, which after googling gives no promising answer of what it is.
Does anyone know what Myriad sequences
is and what li
get expanded to?
Also after playing a while with godbolt, I see that writing the following function:
int func(void) {
return 0x12345LL;
}
Outputs the following asm output (With -O2
flag):
func:
li a0,73728
addi a0,a0,837
ret
But if I tick the compile to binary
option, godbolt gives me
main:
lui a0,0x12
addi a0,a0,837 # 12345 <__BSS_END__+0x30d>
ret
Does this mean that li
get expanded to addi
? Or was it the linker doing some optimizations?
If you're programming in assembly or a compiler writer generating assembly code, you can choose to use the li
pseudo instruction or avoid it, as you like.
For 32-bit integers, it will generate some combination of lui
+ addi
, though either one may be omitted depending on the constant's value.
If the immediate value given to li
is small and can fit in signed 12-bit field, an addi
alone will suffice, though the assembler would not be remiss in using ori
instead.
If the immediate value given to li
is a multiple of 0x1000 then lui
alone can handle the job as well.
For constants larger than 32 bits, the choice of sequence can be 3 or more instructions. We have seen compilers generate working but suboptimal code sequences for certain large constants.
There is a trade off here between number of registers used to construct a large immediate and the number of instructions needed to do so — sometimes using an extra register can shorten the code sequence, reusing an intermediate value. Other optimizations are possible as well, such as reusing (as in common subexpression elimination or loop invariant code motion) an intermediate value to be used in generation of two separate larger immediates.
However, while the option of using more registers so as to use fewer instructions is available to compilers and assembly programmers (since they are aware of other register usage in the compilation) this trade off is not available to assemblers themselves as they don't track register usages within or between functions. Assemblers are thus limited to using one scratch register, namely the target of the li
as a temporary to construct the constant.
Many clever sequences are possible to construct an immediate. An addi
followed by a shift left may be appropriate for constants that are 12 non-zero bits or less but have zero bits as LSBs (even though the lui
+ addi
at the same instruction length will work as well). These sequence variations become more relevant for larger (>32 bit) constant values.
There's no consumer need for the assembler to use a well-defined sequence to generate the immediate, so the RISC V specification omits any specification and defers to the assembler implementation.
In fact, the RISC V specification goes beyond most ISA specs in rolling into (and standardizing) the specifications of many useful pseudo instructions like ret
and call
that are not really part of the ISA literally, but do help both with assembly programming as well as facilitation of hardware optimization like shadow call stack, via helping to cement certain conventions.
The lui
+ addi
sequence has some oddities as follows: the addi
uses a 12-bit signed immediate, which means that in order to construct a 32-bit immediate that has the 12th bit set, the constant provided to the lui
must be biased by +1 ! This is because the 12-bit immediate from the addi
will sign extend and become negative so the bias of +1 in the lui
is needed.
One might ask why did the RISC V designers use addi
that sign extends?
For background let's note that MIPS designers chose two forms of sign extension, signed for addi
and unsigned/zero for ori
. So for certain sequences MIPS could avoid the +1/-1 issue. However, they also wanted to be able to put the lower part of the immediate into the offset available in lw
and sw
, and these have sign extending immediates (which is desirable for other reasons). So all the machinery needed to use immediates in lui
+ lw
is needed anyway, meaning the bias of lui
by +1 in certain cases.
RISC V embraces this — even eliminated the zero extending immediates altogether to simplify decoding — so they must contend with the +1/-1 biasing for all lui
+ ... combinations anyway.