assemblyarm64

Why LDP crashes when using w registers but not with x registers


On Aarch64 assembly for Apply M1 I get EXC_BAD_ACCESS error when using LDP with w 32-bit registers. The same problem does not happen when I use x 64-bit registers instead.

Example using x which works fine:

.global _start

_start:
    mov x0, #1 // arg1
    mov x1, #2 // arg2

    stp x0, x1, [sp, #-16]! // push these values to the stack before we branch to another location.

    bl add_nums
    mov x2, x0   // save x0 to w2, so we don't lose once we restore the original x0 from stack

    ldp x0, x1, [sp], #16 // pop off x0, x1 with the values they were before going into add_numbs

    // Exit program
    mov x0, 0       // 0 status code
    mov x16, 1
    svc 0

add_nums:
    // stores the result back to x0, that is why we need to store the original
    // x0 on _start into the stack, so we could restore it later.
    add x0, x0, x1
    ret

The problematic code. It is the same as the previous one, I just updated everything use w instead of x registers, and with that the range modified in the stack point also is half o the size:

_start:
    mov w0, #1 // arg1
    mov w2, #2 // arg2

    stp w0, w2, [sp, #-8]! // push these values to the stack before we branch to another location.

    bl add_nums
    mov w2, w0   // save w0 to w2, so we don't lose once we restore the original w0 from stack

    ldp w0, w2, [sp], #8 // pop off w0, w2 with the values they were before going into add_numbs

    // Exit program
    mov x0, 0       // 0 status code
    mov x16, 1
    svc 0

add_nums:
    // stores the result back to w0, that is why we need to store the original
    // w0 on _start into the stack, so we could restore it later.
    add w0, w0, w2
    ret

The full error is:

thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=259, address=0x16fdfe948)

Apparently A64 should support LDP with w registers https://developer.arm.com/documentation/dui0801/l/A64-Data-Transfer-Instructions/LDP--A64-?lang=en.


Solution

  • Thanks to Jester and Frank's comments it makes sense now.

    For AArch64, sp must be 16-byte aligned whenever it is used to access memory. This is enforced by AArch64 hardware - reference.

    It means for AArch64 the stack point must always be multiple of 16-byte even if the data being stored there is smaller than that. Otherwise that kind of error EXC_BAD_ACCESS will occur.