assemblyriscvinstruction-encoding

Why encode RISCV PseudoInstruction LI to four instructions instead of two?


Dear RISCV enthusiasts,

My question is about encoding

li t1, 0xFF00F007

When using https://riscvasm.lucasteske.dev/# the code above encodes to

   0:   000ff337            lui t1,0xff
   4:   00f3031b            addiw   t1,t1,15
   8:   00c31313            slli    t1,t1,0xc
   c:   00730313            addi    t1,t1,7 # ff007 <_sstack+0xee607>

In my naive attempt to encode the instructions, I came up with

[0] lui sp, 0xFF00F         # --pseudo--> I_LI -> FF00F137
[4] addiw t1, sp, 0x7       # --pseudo--> I_LI -> 0071031B

I must be missing something.

My questions are:

1. is my own attempt correct?
2. If it is correct, why would https://riscvasm.lucasteske.dev/# output 4 instructions instead of 2 ? It seems that 4 instructions are less performant than 2.

Thanks for your input.

I tried my own code in the RISCV interpreter https://www.cs.cornell.edu/courses/cs3410/2019sp/riscv/interpreter/ and it seems that the correct immediate ends up in the register.

So I am not seeing what I did wrong.


Solution

  • You're using a 64-bit machine because the code sequences are using addiw.

    The question is whether 0xFF00F007 should be considered signed or not.

    Apparently, that online assembler treats such constant as unsigned.  Let's note that RARS considers that same construct as a signed constant, so it does your code sequence.

    Their sequence loads 0x00000000FF00F007 into t1, while your sequence loads 0xFFFFFFFFFF00F007.

    The reason being that on 64-bit machines, lui sign extends.

    If you tried it on the Cornell simulator, you would have had to change addiw to addi, because that is a 32-bit machine simulator, and 32-bit RISC V machines don't have addiw.