assemblygccriscvgnu-assembler

Correct way to add labels for RISCV data section (so assembler can pick up)


The assembler translated the access to first label in .data section with no problem, but the subsequent labels to some gp relative address, which if I don't get what's the purpose for.

I'm trying to write a simple riscv assembly program that would have some array operations (I think allocate in .data section)

It's as simple as it could get, allocated spaces, for example, need to have .data section and .rodata section, I'd have

.section .rodata     <---- **okay access**
.align 2
sentence1:
  .string "Sentence number one\n"
.align 2
sentence2:
  .string "Sentence number two\n"

.section .data
.align 2
num1:               <---- **okay access**
  .word 0
num2:               <---- **can't find**
  .word 0
.align 2
arr:                <---- **can't find**
  .space 1024

.section .text
__start:
<my program logic>

I'm using the RISCV-toolchain's assembler:

riscv64-unknown-elf-as -v
GNU assembler version 2.40.0 (riscv64-unknown-elf) using BFD version (GNU Binutils) 2.40.0.20230214

The problem is, when executing it always segfaulted where begin to access anything after the first label in the .data section, the first label, num1 in this case, translated fine, also anything in .rodata such as sentence1 and sentence2 also can be accessed

by accessed I mean the address can be found, through instructions such as:

la   a0, num1

or

auipc  a0, %hi(num1)
addi   a0, %lo(num1)

then having load and store such as lw or sw on a0 would have no problem.

The problem arises when trying to access anything in .data (not .rodata, .rodata has no problem!) that is after the first label, such as num2 or arr.

When I run the program in gdb to find what's going on, viewing the execution of each assembly line, it translated the first access to the address into a pair of auipc and addi to get the correct address (for num1), but after that, getting subsequent addresses are translated into an addi instruction using gp global pointer.

Which, since I either linked with just ld with no library and no crt0.o, crtstart.o or gcc telling --nostartfiles and --nostdlib. I think it's maybe something going on there setting up the gp? or is gp reference is just means something that the assembler can't pick up?

I try to write up some C code with the similar intent and structure and did gcc -S and in its output file, it allocated the spaces for those labels using more than just a name and space, but a combination of .local .comm such as:

.local sum1
.comm sum1, 4, 4

(I had sum1 as a file scope static variable, if removed the static, I think it would have .globl .comm .type .size things)

Are they in charge of emiting (I actually don't really known what this word means) the labels so the assembler can pick the symbol up and translate into the correct address? If I don't want to use .comm but just write .word or .byte, how should I emit (?) the address?

Or somewhere else in my theory is wrong?


Solution

  • I think you're having trouble with an uninitialized global pointer.

    The default linker script (which is used when you do not pass one to ld with the -T argument) defines __global_pointer, whose value is supposed to be used to initialize gp. This initialization is usually done in startup files that don't get linked with -nostartfiles.

    So you can either do away with this flag, or setup gp yourself right at the start of .text, like so,

    .option push
    .option norelax
    la gp, __global_pointer$
    .option pop
    

    Using gp instead of an auipc + addi pair to access symbols in .data is called linker relaxation, and is done because ld encountered a definition of __global_pointer& in the linker script. You can read more about it here