assemblyapple-m1arm64

Some explanation needed for simple assembler running on M1 Apple Silicon


It must be staring right into my face, but I fail to see it.

I'm learning assembler for Apple Silicon (ARM) and want to print out integers to the screen. My code works, but I don't understand the content of X3 in the instruction STRB W5, [X3, #-1]! (W5 holds the digit to store.)

Register X3 is pointing to the address of the label buffer. Let's assume that is 0x0010. The length of this label is 12 bytes, so it runs through 0x001C. The first iteration of the code separates the last digit from my integer number to print and stores it at the end of buffer at address 0x001C.
What I fail to see is how this instruction 'knows' to store it at location 0x001C as X3 is pointing to 0x0010.
Any thoughts?

Here is my code snippet that does the trick. (again, this code is working...)

.data
matrix: .quad 15,2,3,7
buffer: .byte 12        // say in runtime this address is 0x0010


.text
.global _start          // Provide program starting address to linker
.align 4                // Make sure everything is aligned properly

_start:
    mov     X0, #9565       // The number we want to print
    mov     x1, #10         // Base 10 (decimal)
    adrp    X2,buffer@PAGE       // Load the address of the page where buffer lives
    add     X2,X2,buffer@PAGEOFF // load the buffer address into X2 including the offset
    mov     X3, X2           // Copy the buffer address into X3 this will be 0x0010


convert_loop:
    UDIV    X4, X0, X1          // Divide X0 by 10 (result in X4). we loose the last digit of the printed number
    MSUB    X5, X4, X1, X0      // multiply-subract. X4 contains number div by 10 so /wo last digit. We multiply the full 
                                // number
    ADD     X5, X5, #'0'        // Convert the remainder to its ASCII value
    STRB    W5, [X3, #-1]!      // Store register byte the character in the buffer, moving backward
    MOV     X0, X4              // Update x0 with the quotient
    CBZ     X0, print_number    // If x0 is 0, we're done

    B convert_loop       // Loop again

print_number:
    mov     X0, #1          // File descriptor for stdout
    mov     X1, X3          // Address of the buffer
    sub     X2, X2, X3      // Calculate the length of the string
                            // Exit the program (specific to systems like macOS/Linux)
    mov     X16, #4         // System call number 1 terminates this program
    svc     #0x80           // Call kernel to terminate the program

    MOV     X16, #1
    mov     X0, #0
    svc     #0x80           // Call kernel to terminate the program

Solution

  • What I fail to see is how this instruction 'knows' to store it at location 0x001C as X3 is pointing to 0x0010.

    It doesn't. No such thing occurs.

    When I run it, the code does exactly what it says: it starts with x3 pointing to buffer, then pre-decrements it on storing each byte, and so the bytes of the formatted decimal number are stored before the label buffer. Since that's where the data of matrix was located, it gets overwritten. But the program still "works" in that it successfully prints out the decimal number - just not from the intended buffer, but instead from memory intended for matrix. Since your program doesn't use the actual matrix data for anything, you aren't (yet) encountering any problems from it having been overwritten.

    Here's some output from lldb upon reaching print_number:

    (lldb) reg read x3
          x3 = 0x000000010000401c  matrix + 28
    (lldb) p &buffer
    (void **) 0x0000000100004020
    (lldb) mem read $x3
    0x10000401c: 39 35 36 35 0c 00 00 00 00 00 00 00 00 00 00 00  9565............
    0x10000402c: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    

    To fix this, note another bug: .byte 12 doesn't reserve 12 bytes of space; rather, it reserves 1 byte and initializes it with the value 12. What you want here is .space 12 or something equivalent.

    Then you could do:

    buffer:
        .space 12
    buffer_end:
    // ...
        adrp    X2, buffer_end@PAGE
        add     X2, X2, buffer_end@PAGEOFF
    

    and keep everything else the same. This actually will initialize X2 to point to the end of the buffer, as was your goal.


    Other code review comments: