[SOLVED] How to get NASM to encode `push` with a sign-extended 16-bit immediate?

How to get NASM to encode `push` with a sign-extended 16-bit immediate?

When assembling the following with NASM:

BITS 64
push 32767

I get 68 ff 7f 00 00. This is interesting to me, since this is the 32b encoding (push dword). Any ideas why it doesn't resort to the 16b encoding (which would be shorter, and use less stack memory)?

I've tried various combinations of immediates, but it just seemingly never uses the 16b encoding.

Solution

16-bit immediate is only available with 16-bit operand-size.
(RSP -= 2 and doing a 2-byte store: push word 0x7fff).

Like most x86 instructions such as add, there is no sign_extended_imm16 encoding. (The special thing about push and other stack ops is that the default operand-size is 64 in 64-bit mode, not overridable to 32 even with a REX prefix, only to 16-bit with a 66 prefix. vs. other opcodes default to 32, with 64-bit encoded with REX.W. 8-bit has its own opcodes since that dates back to 8086. 32 and 64 were added by extensions and there wasn't room for new opcodes, just prefixes.)

In 32 and 64-bit mode without an operand-size prefix to specify 16-bit operand-size, immediates are either 8-bit sign-extended or 32-bit (sign-extended if the operand-size is 32-bit). This applies to all normal opcodes; there are a couple special cases like mov r64, imm64 and enter imm16,imm8.

In NASM, push 1234 defaults to push qword 1234, not push word 1234 because that's what people normally want. (See How many bytes does the push instruction push onto the stack when I don't specify the operand size? Not a duplicate since my answer doesn't IIRC go into detail about the fact that push imm16 only exists for 16-bit operand-size.)

The table at the top of Intel's manual entry for push (HTML extract: https://www.felixcloutier.com/x86/push) isn't super clear about that, with the "description" column not mentioning an operand-size. The text Description section doesn't mention it either.

68 iw and 68 id are the same opcode, so the only way the CPU could know which you wanted is by the operand-size attribute. This, and the general rules for encoding operand-size (default implied by current mode, overridable with prefixes) imply that there's no way to encode a 68 iw that will get sign-extended to a wider width; 68 only takes an iw (immediate word) when the operand-size is 16-bit.