I have tried to write an ARM-LEGv8 assembler program that calculates the average of two values in an array at a certain position. It runs on a Raspberry Pi with Armbian.
The pseudo code should look like this:
int average(int v[], int i){
int average = (v[i] + v[i-1])/2;
return average;
}
The array is at X0 and the i on X1.
My assembly code looks like this:
.globl _start
.data
myArray: .word 0, 1, 2, 3, 35, 5
.text
average:
LSL X9, X1, #2
ADD X9, X9, X0
LDR X10, [X9, #0] // guess Segmentation Fault
LDUR X11, [X9, #-4]
ADD X3, X10, X11
LSR X3, X3, #1
BR X30
_start:
LDR X0, myArray
MOV X1, #5
BL mittelwert
MOV X8, #0x5d
MOV X0, X3
SVC 0
I used these commands to build it:
as average.s -o average.o
gcc average.o -o average -nostdlib -static
When I run my program I get a Segmentation Fault. Why?
(Disclaimer: the following is based on the actual ARMv8-A instruction set. I'm not sure what changes LEGv8 may have made.)
LDR X0, myArray
doesn't load X0
with the address of the label myArray
. It loads a doubleword from that address (ARM calls this the "literal" form of the load instruction). So after this instruction, X0
contains 0x0000000100000000
which naturally results in an invalid pointer by the time you do LDR X10, [X9, #0]
.
You may have meant LDR X0, =myArray
which will place a pointer to myArray
into the literal pool, then assemble a literal load of that pointer from its address in the pool. That would work, assuming your system can handle that type of relocation. However, for modern position-independent executables used by common operating systems, the preferred method is
ADR X0, myArray
ADD X0, X0, #:lo12:myArray
The first instruction populates the high 52 bits of X0
with those bits of the address of myArray
, using an offset from PC. The second adds in the low 12 bits. See also Understanding ARM relocation (example: str x0, [tmp, #:lo12:zbi_paddr])
A couple other bugs and remarks:
Your LDR X10, [X9, #0]
and LDUR X11, [X9, #-4]
are 64-bit loads, because you used an X register as the destination. But the elements of myArray
were defined with .word
, 32 bits. So the high 32 bits of each register will contain garbage, or they may crash if the loads extend beyond the end of the array into an unmapped page. To be consistent with 32-bit elements, load them into W registers LDR W10, [X9, #0]
and LDUR W11, [X9, #-4]
, and then do your arithmetic on the W registers instead.
You are thinking of your array elements as type int
, which is signed, but your code currently would not correctly handle negative values (hint: what is the L
in LSR
?). Think about how to fix this, or change it to unsigned
.
Likewise, i
is declared in C as int
, but you access X1
as a 64-bit register. If you call this function from C, the ARM64 ABI allows the high bits of X1 to be garbage. You probably want to declare it as size_t
or unsigned long
instead. If you do keep it as a 32-bit type, most likely unsigned
is what you want, and then you need to zero-extend W1
into X1
before using it.
When returning from a function, prefer RET
to BR X30
as the former is better optimized for this purpose. (Though maybe LEGv8 doesn't have RET
?)