The assembler translated the access to first label in .data
section with no problem, but the subsequent labels to some gp
relative address, which if I don't get what's the purpose for.
I'm trying to write a simple riscv assembly program that would have some array operations (I think allocate in .data section)
It's as simple as it could get, allocated spaces, for example, need to have .data section and .rodata section, I'd have
.section .rodata <---- **okay access**
.align 2
sentence1:
.string "Sentence number one\n"
.align 2
sentence2:
.string "Sentence number two\n"
.section .data
.align 2
num1: <---- **okay access**
.word 0
num2: <---- **can't find**
.word 0
.align 2
arr: <---- **can't find**
.space 1024
.section .text
__start:
<my program logic>
I'm using the RISCV-toolchain's assembler:
riscv64-unknown-elf-as -v
GNU assembler version 2.40.0 (riscv64-unknown-elf) using BFD version (GNU Binutils) 2.40.0.20230214
The problem is, when executing it always segfaulted where begin to access anything after the first label in the .data
section, the first label, num1
in this case, translated fine, also anything in .rodata
such as sentence1
and sentence2
also can be accessed
by accessed I mean the address can be found, through instructions such as:
la a0, num1
or
auipc a0, %hi(num1)
addi a0, %lo(num1)
then having load and store such as lw
or sw
on a0
would have no problem.
The problem arises when trying to access anything in .data
(not .rodata
, .rodata
has no problem!) that is after the first label, such as num2
or arr
.
When I run the program in gdb to find what's going on, viewing the execution of each assembly line, it translated the first access to the address into a pair of auipc
and addi
to get the correct address (for num1
), but after that, getting subsequent addresses are translated into an addi
instruction using gp
global pointer.
Which, since I either linked with just ld
with no library and no crt0.o
, crtstart.o
or gcc
telling --nostartfiles
and --nostdlib
. I think it's maybe something going on there setting up the gp
? or is gp
reference is just means something that the assembler can't pick up?
I try to write up some C code with the similar intent and structure and did gcc -S
and in its output file, it allocated the spaces for those labels using more than just a name and space, but a
combination of .local .comm
such as:
.local sum1
.comm sum1, 4, 4
(I had sum1
as a file scope static variable, if removed the static
, I think it would have .globl .comm .type .size
things)
Are they in charge of emiting (I actually don't really known what this word means) the labels so the assembler can pick the symbol up and translate into the correct address? If I don't want to use .comm
but just write .word
or .byte
, how should I emit (?) the address?
Or somewhere else in my theory is wrong?
I think you're having trouble with an uninitialized global pointer.
The default linker script (which is used when you do not pass one to ld
with the -T
argument) defines __global_pointer
, whose value is supposed to be used to initialize gp
. This initialization is usually done in startup files that don't get linked with -nostartfiles
.
So you can either do away with this flag, or setup gp
yourself right at the start of .text
, like so,
.option push
.option norelax
la gp, __global_pointer$
.option pop
Using gp
instead of an auipc + addi
pair to access symbols in .data
is called linker relaxation, and is done because ld
encountered a definition of __global_pointer&
in the linker script. You can read more about it here