I am confused about the behavior of the RISC-V assembler (rv64). Note that this question concerns how human-readable assembly is meant to behave, not how the machine instructions behave (that is clear).
It seems like beq rs1, rs2, x
sets PC ← x
if R[rs1] = R[rs2]
, as opposed to PC ← PC + x
. This is fine. However, there seems to be inconsistency with whether j x
sets PC ← x
or PC ← PC + x
.
Here is the sequence which led to this question:
Consider the following assembly program:
addi x1, x1, 1
beq x1, x1, bob
addi x2, x2, 3
bob:
addi x3, x3, 4
addi x4, x4, 5
If we assemble and then disassemble it, we get:
addi ra,ra,1
beq ra,ra,0xc
addi sp,sp,3
addi gp,gp,4
addi tp,tp,5 # 0x5
This is how I reached the conclusion that beq
uses absolute addressing. Now, if we take this program and assemble and disassemble it, we get:
addi ra,ra,1
bne ra,ra,0xc
j 0x8
addi sp,sp,3
addi gp,gp,4
addi tp,tp,5 # 0x5
Therefore, it seems that j 0x8
must be using a relative offset. It must skip two instructions forward to get to the adding 4 line we want. If this were using absolute addressing, it would get stuck in an infinite loop with itself.
I was satisfied here until I looked at this program here:
addi x1, x1, 1
j bob
addi x2, x2, 3
bob:
addi x3, x3, 4
addi x4, x4, 5
If we assemble and disassemble this, we get:
addi ra,ra,1
j 0xc
addi sp,sp,3
addi gp,gp,4
addi tp,tp,5 # 0x5
If j 0xc
is indeed using a relative offset as before, then jumping to the label bob:
would skip the instruction that it's on and just do addi tp,tp,5
.
This doesn't make sense. And to confirm that it doesn't, I used gcc
inline assembly, put the values of these registers in variables, and printed them out, and, in fact, the addi x3, x3, 4
instruction does get executed, implying that j 0xc
is using an absolute offset, contradicting what we saw earlier.
I'm sorry if this is a lot, but can anyone explain this apparent discrepancy? (Practically, I understand the answer is "Just use labels," but I'd still like to understand)
Edit:
Here is the result of the disassembling with addresses of the .s
with beq ra,ra,0xc
.
0000000000000000 <.text>:
0: 00108093 addi ra,ra,1
4: 00109463 bne ra,ra,c <.text+0xc>
8: 0000006f j 8 <.text+0x8>
c: 00310113 addi sp,sp,3
10: 00418193 addi gp,gp,4
14: 00520213 addi tp,tp,5 # 5 <.text+0x5>
It seems like the inconsistency is still there.
You seem to be confused about this instruction:
8: 0000006f j 8 <.text+0x8>
If you look at RISCV instruction decoding tables, you'll see that that value (0x0000006f) is in fact a JAL instruction (opcode 0b1101111) with a destination of X0 (ignored) and an offset of 0. So if you actually executed that instruction, it would be an infinite loop, jumping to itself.
What is going on here is that this is actually a "partial" instruction that needs to have some bits filled in by the linker. Somewhere in the object file's symbol table or relocations, there will be a reference to this instruction address (.text + 8) that specifies a symbol that this branch should jump to, and a relocation type of relative branch. When the linker links this code, it will fill in the upper bits of the instruction with the offset of that symbol in order to create a final instruction to run.