assemblyriscv

RISC-V assembly: global pointer set to a weird value


I am experimenting with RISC-V assembly language on an emulator (qemu64, ubuntu for RISC-V).

Here is a simple program, its function is to convert the instr string to uppercase, outstr is the resulting string.

.global _start

_start:
    la x5, outstr
    la x6, instr

loop:
    lb x7, 0(x6)
    addi x6, x6, 1
    li x28, 'z'
    bgt x7, x28, cont
    li x28, 'a'
    blt x7, x28, cont
    addi x7, x7, ('A'-'a')

cont:
    sb x7, 0(x5)
    addi x5, x5, 1
    li x28, 0
    bne x7, x28, loop

    li a0, 1
    la a1, outstr
    sub a2, x5, a1

    li a7, 64
    ecall

    li a0, 0
    li a7, 93
    ecall

.data
instr: .asciz "String to conVErt xYz.\n"
outstr: .fill 255, 1, 0

For now I am looking at the very two first instructions, where the address of outstr is loaded in x5/t0, and the address of instr in x6/t1

The disassembly for these two instructions, given by GDB, is the following:

   0x00000000000100e8 <+0>: addi    t0,gp,-2024
   0x00000000000100ec <+4>: auipc   t1,0x1
   0x00000000000100f0 <+8>: addi    t1,t1,84 # 0x11140

So according to the first instruction, we expect t0 = (gp-2024)

Let's get the address of the outstr variable:

(gdb) info variables
All defined variables:

Non-debugging symbols:
0x0000000000011140  __DATA_BEGIN__
0x0000000000011140  instr
0x0000000000011158  outstr
0x0000000000011257  __SDATA_BEGIN__
0x0000000000011257  __bss_start
0x0000000000011257  _edata
0x0000000000011258  __BSS_END__
0x0000000000011258  _end

outstr is stored at address 0x11158.

Let's get the value of t0, which is supposed to be the address of outstr:

(gdb) info registers x5
x5             0x55555567c3ac      93824993444780

Something is wrong, what happened ? Let's get the value of gp:

(gdb) info register gp
gp             0x55555567cb94   0x55555567cb94

This value is weird.

As expected, we have t0 = (gp-2024); 0x55555567cb94-2024 = 0x55555567c3ac; the addi instruction returns a correct result.

But t0 is not the address of outstr ! This leads, when trying to access the outstr using the address stored in t0, to a segmentation fault (which makes sens). The issue arises because the gp register is set to an unexpected value, but I don't understand why. Does anyone have a clue ?

Thanks.

EDIT: adding the Makefile

OBJS = chapter5_ToUppercase.o
DEBUGFLAGS = -g

%.o : %.S
        as $(DEBUGFLAGS) $< -o $@


chapter5_ToUppercase: $(OBJS)
        ld -o chapter5_ToUppercase $(OBJS)

Solution

  • SOLUTION: the issue was due to the global pointer gp being not initialized

    To solve that, I had first to edit the linker script and define the initialization value of the register:

      .data           :
      {
        __DATA_BEGIN__ = .;
        PROVIDE_HIDDEN (__my_gp = . + 0x800);
        *(.data .data.* .gnu.linkonce.d.*)
        SORT(CONSTRUCTORS)
      }
    

    Because the RISCV immediate values are 12 bits signed values (+/- 0x800), we set the gp value to (.data + 0x800)

    Actually, at this stage, we defined what will be the init value of gp, but we didn't initialize gp. To do that, we have to tell the RISCV to load gp with the value that we defined in the linker script:

    _start:
    .option norelax
            la gp, __my_gp
    .option relax
            la x5, outstr
            la x6, instr
    

    Note that it is required to disable the norelax option before writing the gp register. It took me an hour to figure out why the la gp, __my_gp instruction was not working...

    Thank you everyone for your help.