It must be staring right into my face, but I fail to see it.
I'm learning assembler for Apple Silicon (ARM) and want to print out integers to the screen. My code works, but I don't understand the content of X3 in the instruction STRB W5, [X3, #-1]!
(W5 holds the digit to store.)
Register X3 is pointing to the address of the label buffer. Let's assume that is 0x0010
. The length of this label is 12 bytes, so it runs through 0x001C
.
The first iteration of the code separates the last digit from my integer number to print and stores it at the end of buffer at address 0x001C
.
What I fail to see is how this instruction 'knows' to store it at location 0x001C
as X3 is pointing to 0x0010
.
Any thoughts?
Here is my code snippet that does the trick. (again, this code is working...)
.data
matrix: .quad 15,2,3,7
buffer: .byte 12 // say in runtime this address is 0x0010
.text
.global _start // Provide program starting address to linker
.align 4 // Make sure everything is aligned properly
_start:
mov X0, #9565 // The number we want to print
mov x1, #10 // Base 10 (decimal)
adrp X2,buffer@PAGE // Load the address of the page where buffer lives
add X2,X2,buffer@PAGEOFF // load the buffer address into X2 including the offset
mov X3, X2 // Copy the buffer address into X3 this will be 0x0010
convert_loop:
UDIV X4, X0, X1 // Divide X0 by 10 (result in X4). we loose the last digit of the printed number
MSUB X5, X4, X1, X0 // multiply-subract. X4 contains number div by 10 so /wo last digit. We multiply the full
// number
ADD X5, X5, #'0' // Convert the remainder to its ASCII value
STRB W5, [X3, #-1]! // Store register byte the character in the buffer, moving backward
MOV X0, X4 // Update x0 with the quotient
CBZ X0, print_number // If x0 is 0, we're done
B convert_loop // Loop again
print_number:
mov X0, #1 // File descriptor for stdout
mov X1, X3 // Address of the buffer
sub X2, X2, X3 // Calculate the length of the string
// Exit the program (specific to systems like macOS/Linux)
mov X16, #4 // System call number 1 terminates this program
svc #0x80 // Call kernel to terminate the program
MOV X16, #1
mov X0, #0
svc #0x80 // Call kernel to terminate the program
What I fail to see is how this instruction 'knows' to store it at location 0x001C as X3 is pointing to 0x0010.
It doesn't. No such thing occurs.
When I run it, the code does exactly what it says: it starts with x3
pointing to buffer
, then pre-decrements it on storing each byte, and so the bytes of the formatted decimal number are stored before the label buffer
. Since that's where the data of matrix
was located, it gets overwritten. But the program still "works" in that it successfully prints out the decimal number - just not from the intended buffer, but instead from memory intended for matrix
. Since your program doesn't use the actual matrix
data for anything, you aren't (yet) encountering any problems from it having been overwritten.
Here's some output from lldb
upon reaching print_number
:
(lldb) reg read x3
x3 = 0x000000010000401c matrix + 28
(lldb) p &buffer
(void **) 0x0000000100004020
(lldb) mem read $x3
0x10000401c: 39 35 36 35 0c 00 00 00 00 00 00 00 00 00 00 00 9565............
0x10000402c: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
To fix this, note another bug: .byte 12
doesn't reserve 12 bytes of space; rather, it reserves 1 byte and initializes it with the value 12. What you want here is .space 12
or something equivalent.
Then you could do:
buffer:
.space 12
buffer_end:
// ...
adrp X2, buffer_end@PAGE
add X2, X2, buffer_end@PAGEOFF
and keep everything else the same. This actually will initialize X2
to point to the end of the buffer, as was your goal.
Other code review comments:
Using .quad
on ARM can look confusing. It assembles a 64-bit integer (8 bytes), but the name comes from "quad-word". This seems wrong because ARM nomenclature is that a "word" is 32 bits (4 bytes), so one would guess that .quad
means a 128-bit value. In fact .quad
makes more sense on an architecture like x86 where "word" is defined as 16 bits. I would use .xword
or .8byte
to assemble a 64-bit value on ARM64.
You have some unnecessary register-register MOVs, which could be avoided if you plan ahead. For instance the MOV X1, X3
just before the system call would not be needed if you used X1
for your pointer above instead of X3
.
Your "exit the program" comment is in the wrong place.