assemblysimdarm64cpu-registersneon

Do AArch64 SIMD instructions zero/sign extend results?


I'm maintaining the Reko decompiler and working on bugfixes in its support for AArch64. I've been asked to fix an issue in an AArch64 binary that contains the following instruction:

0EA0B9BF abs v31.2s,v13.2s

I've compared the output above (which comes from Reko's AArch64 disassembler) with objdump and it matches. I've consulted the ARM documentation for the abs instruction to understand what this instruction is doing.

Note that the .2s suffixes imply that the instruction is operating on two 32-bit signed integers, but v31 and v13 are 128-bit. My initial guess was that the instruction leaves the upper 64 bits of v31 untouched, but I'm not certain my interpretation is correct. Consulting scripture reveals the following pseudocode for abs:

CheckFPAdvSIMDEnabled64();
 bits(datasize) operand = V[n];
 bits(datasize) result;
 integer element;

 for e = 0 to elements-1
 element = SInt(Elem[operand, e, esize]);
 if neg then
   element = -element;
 else
   element = Abs(element);
 Elem[result, e, esize] = element<esize-1:0>;

V[d] = result;

In the pseudocode, the result variable is not initialized with the original value of V[d] but does write back the whole 128 bits in the final pseudo-statement.

So: is result actually zero-initialized, meaning that the upper 64 bits are cleared after execution of this instruction? And will this apply to all SIMD instructions whose outputs do not "cover" the full 128 bits of the destination SIMD register?

Unfortunately I don't have the appropriate silicon to test this myself, and available AArch64 emulators are crashing when I try using them with SIMD instructions.


Solution

  • As per ARM Architecture Reference Manual (Armv8, for A-profile architecture), section C1.2.5:

    SIMD and floating-point scalar register names

    SIMD and floating-point instructions that operate on scalar data only access the lower bits of a SIMD and floating-point register. The unused high bits are ignored on a read and cleared to 0 on a write.

    (...)

    SIMD vector register names

    If a register holds multiple data elements on which arithmetic is performed in a parallel, SIMD, manner, then a qualifier describes the vector shape. The vector shape is the element size and the number of elements or lanes. If the element size in bits multiplied by the number of lanes does not equal 128, then the upper 64 bits of the register are ignored on a read and cleared to zero on a write.

    This confirms that writes to 64 bit ASIMD vectors zero out the upper bits of the vector. Note that behaviour is different in AArch32 state, where 64 bit and 128 bit vectors have different numbering schemes and merely overlap. There, no other half exists to be zeroed out as 64 bit operations target 64 bit vector registers.