I'm looking at the following disassembled AArch64 instruction:
65 6E 20 2B adds w5, w19, w0, uxtx #3
According to the ARM manual, uxtx
zero-extends w0
to an unsigned 64-bit value before adding it to the value in w19
. But w19
is a 32-bit "slice" of x19
, and the result is stored in a 32-bit slice of x5
. That is, the sizes of the operation's values differ.
The question is not restricted to adds
; other AArch64 instructions like add
or sub
exhibit the same encoding. The question also applies to the 64-bit sxtx
signed extension, which due to sign extension issues might very well be expected to not behave the same as the 32-bit sxtw
.
Are uxtx
and sxtx
acting exactly like uxtw
and sxtx
respectively when used with 32-bit register slices? If so, what value is ARM providing by supporting both [us]xtw
and [us]xtx
extension encodings for these apparently identical operations? If not, is there a difference that would be visible to the user program?
They all do the same thing, i.e. nothing.
As you say, logically, sign- or zero-extending a value to a width larger than the operand size should not actually affect the value used, and that's correct. You can confirm it with a careful reading of the pseudocode in the Architecture Reference Manual. In the code for ExtendReg
, note the line len = Min(len, N - shift)
. Here N
is 32, so it makes no difference whether len
is 32 or 64.
Similarly, uxtx
and sxtx
are both no-ops for either 32-bit or 64-bit instructions.
So the following instructions all have exactly the same architectural effect, performing the operation w0 = w1 + (w2 << 3)
. I actually tested them with a selection of chosen and random inputs, verifying that the results and flags are identical for all five.
0: 2b224c20 adds w0, w1, w2, uxtw #3
4: 2b22cc20 adds w0, w1, w2, sxtw #3
8: 2b226c20 adds w0, w1, w2, uxtx #3
c: 2b22ec20 adds w0, w1, w2, sxtx #3
10: 2b020c20 adds w0, w1, w2, lsl #3
However, note that their encodings are different.
And that is also why they use different mnemonics for the extension operation: one of the principles of the ARM64 assembly language is that every legal binary encoding should have its own unambiguous assembly. So if for some obscure reason you care whether you get the encoding 0x2b224c20
or 0x2b226c20
-- say you are trying to write shellcode where certain bytes are forbidden -- you can specify uxtw
or uxtx
to select the one you want. This also means that if you disassemble and reassemble a section of code, you will get back the identical binary that you put in.
(Contrast the situation in x86 assembly language, where redundant encodings do not get distinct mnemonics. So add edx, ecx
may assemble to either 01 ca
(the "store form") or 03 d1
("load form"), and assemblers often don't give you any way to pick which one. Likewise both encodings will disassemble to add edx, ecx
, so if you disassemble and reassemble you may not end up with the same binary you started with. See How to resolve ambivalence in x64 assembly? and its duplicate links.)
The mnemonics for the extension operators reflect the encoding structure, which also helps to explain why the redundant encodings exist in the first place. The extension type is encoded in a 3-bit "option" field, bits 13-15 of the instruction. Bits 13-14 specify the width of the value to be extended:
00
= 8-bit byte B
01
= 16-bit halfword H
10
= 32-bit word W
11
= 64-bit doubleword X
Note that X
is always effectively "no extension". Then bit 15 specifies the signedness: 0 = unsigned U
, 1 = signed S
. So 010 = uxtw
and 011 = uxtx
since that is what they logically specify, even though for a 32-bit operation, both have the same actual effect (i.e. none).
This might seem like a waste of the instruction space, but presumably it allows the decoder hardware to be simpler than if the otherwise redundant encodings were to select some different operation.
The last option listed above, adds w0, w1, w2, lsl #3
has a different encoding altogether because it selects the "Add (shifted register)" opcode, instead of the "Add (extended register)" opcode as the first four do. So this is another redundancy; an add without extension, with a left shift of 0-4 bits, can be done with either opcode. However, this is not entirely useless, because the "extended register" form can use the stack pointer register sp
as an operand, while the "shifted register" can use the zero register xzr/wzr
. Both registers are encoded as "register 31", so each opcode has to specify whether it interprets "register 31" as the stack pointer or as the zero register. So the fact that the two opcodes have overlapping effect lets the instruction set provide addition using either the stack pointer or the zero register, where otherwise only one or the other could be supported.
The sxt/uxt
syntax shows up in a couple other places in the ARM64 assembly language, with slightly different details in each case.
The sxt*/uxt*
instructions, which simply sign- or zero-extend one register into another. They are aliases for special cases of the sbfm/ubfm
bitfield move instructions. sxtb, sxth, uxtb, uxth
work with either a 32- or 64-bit destination, and sxtw x0, w1
with a 64-bit destination only.
The GNU assembler at least also supports uxtw w0, w1
and uxtw x0, w1
, although the official Architecture Reference Manual does not document them. But they are both just aliases for mov w0, w1
, since writes to 32-bit registers always zero the high half of the corresponding 64-bit register. (And a fun fact is that mov w0, w1
is itself an alias for orr w0, wzr, w1
, a bitwise OR with the zero register.)
There are no mnemonics for the trivial uxtx, sxtx
which would just be a 64-bit move. I suppose logically uxtx x0, x1
could be an alias of ubfm x0, x1, #0, #63
, encoded as 0xd340fc20
, but they didn't bother to support it. The uxtx
operator to adds
is needed because otherwise there would be no way to assemble 0x2b226c20
, but since 0xd340fc20
can already be obtained with ubfm
it doesn't need another redundant name. (Actually it seems ubfm x0, x1, #0, #63
disassembles as lsr x0, x1, #0
, since the immediate shift instructions are also aliases for bitfield move.) Likewise, the useless sxtw w0, w1
is also rejected by the assembler.
The extended-register addressing modes for the load, store, and prefetch instructions. They normally take 64-bit base and index registers ldr x0, [x1, x2]
, but the index can also be specified as a 32-bit register with either zero or sign extension: ldr x0, [x1, w2, uxtw]
or ldr x0, [x1, w2, sxtw]
.
Here there is again a redundant encoding that appears. These instructions contain a 3-bit "option" field with the same position and format as for add
and friends, but here the byte and half-word versions are unsupported, so the encodings with bit 14 = 0 are undefined. Of the remaining four combinations, uxtw (010)
and sxtw (110)
make perfect sense. The other two use a 64-bit index with no extension, and so have the same effect as each other, but they need to be assigned distinct assembly syntax. The 110
encoding, which might logically be uxtx
, is designated the "preferred" encoding and is written with no operator as ldr x0, [x1, x2]
, or ldr x0, [x1, x2, lsl #3]
for the shifted-index the shifted version. The redundant 111
encoding is then selected with ldr x0, [x1, x2, sxtx]
or ldr x0, [x1, x2, sxtx #3]
The uxtl/sxtl
Extend Long SIMD instructions, which zero- or sign-extend the elements of a vector to double their original width. These are actually aliases for the ushll/sshll
long shift instructions, with a shift count of 0. But otherwise there is nothing unusual about their encodings.