There is p256one global data in the arm64 asm code as sample:
DATA p256one<>+0x00(SB)/8, $0x0000000000000001
DATA p256one<>+0x08(SB)/8, $0xffffffff00000000
DATA p256one<>+0x10(SB)/8, $0xffffffffffffffff
DATA p256one<>+0x18(SB)/8, $0x00000000fffffffe
GLOBL p256one<>(SB), 8, $32
I need to load p256one<>(SB) into V0 & V1 registers. Currently I used below method:
LDP p256one<>+0x00(SB), (R0, R1)
LDP p256one<>+0x10(SB), (R2, R3)
VMOV R0, V0.D[0]
VMOV R1, V0.D[1]
VMOV R2, V1.D[0]
VMOV R3, V1.D[1]
Total six directives are used here. We know we can load memory data as below:
VLD1 (R0), [V0.B16, V1.B16]
But it seems we can't load global data with the same method.
So, is there a more efficient way to load global data into NEON registers in Go's Assembler code?
Try to load the address into a register, then load from that address:
MOVD $p256one<>(SB), R0
VLD1 (R0), [V0.B16, V1.B16]