Should the assembler not honor my request for ret imm16?

I understand that the ret imm16 (C2 imm16) instruction with an operand of zero is no different than the operandless ret (C3) in its effect. However, when I explicitly give my assembler ret 0, should it not encode that as the ret imm16 instruction since I explicitly provided the operand?

If I assemble the following code with the version of ml.exe that ships with VS2019 with the command ml file.asm /link /SUBSYSTEM:CONSOLE /ENTRY:stdMain

.386
.MODEL FLAT, STDCALL
.CODE
        stdMain PROC
                xor eax, eax
                ret 0
        stdMain ENDP
END

Then open the executable with a disassembler, I see the instruction that was encoded for ret was C3:

  00401000: 33 C0              xor         eax,eax
  00401002: C3                 ret

I can manually enforce the C2 instruction by hard coding the bytes for it:

.386
.MODEL FLAT, STDCALL
.CODE
    stdMain PROC
        xor eax, eax
        db 0c2h, 0, 0 ; ret imm16=0
    stdMain ENDP
END

Now I see the C2 instruction in the disassembled output:

  00401000: 33 C0              xor         eax,eax
  00401002: C2 00 00           ret         0

Is it correct for an assembler to 'optimize' like that?

Solution

You don't need 3 separate db lines; one db with 3 operands is equivalent:

db  0c2h, 0, 0     ; ret  imm16=0

Is it correct for an assembler to 'optimize' like that?

In general yes, it's accepted that assemblers can use the shortest encoding of an instruction that has exactly the same architectural effect, and has the same mnemonic.

e.g. NASM will optimize mov rax, 123 into mov eax, 123, even though some others (like YASM or GAS) don't by default. (GAS has a -Os option which GCC doesn't pass to it by default). Also NASM will optimize lea eax, [rax*2 + 123] to lea eax, [rax + rax + 123] unless you use [NOSPLIT 123 + rax*2] to spend more code size on a disp32 for the benefit of avoiding a slower 3 component LEA.

NASM doesn't optimize xor rax,rax to xor eax,eax, though; I guess it doesn't check for zeroing idioms (both regs the same) with XOR.

NASM has a -O0 option to not optimize, but that's very bad, e.g. mov rax, -1 is 10 bytes (imm64) instead of 7 (sign_extended_imm32), add ecx, 123 uses an imm32, and jmp foo uses rel32 instead of rel8 even if the label was nearby. (This used to be the default in old NASM versions. https://nasm.us/doc/nasmdoc2.html#section-2.1.24)

MSVC always emits ret as ret 0 in asm listings, so if you're ever assembling code like that you definitely want the assembler to optimize it to normal ret. Apparently this optimization is one that MS thinks it's normal to rely on.

Seems like a dumb design to ever write or emit ret 0 when you want ret, but that's what MSVC does. (Not that MSVC works by feeding asm to MASM; it emits machine code directly unless you ask for an asm listing.)

NASM does happen to assemble ret 0 to ret imm16=0, so you might prefer using it. I know I'd pick NASM over MASM any time I had a choice; simple syntax and free from magic rules about memory operands imply operand-sizes, and sometimes [] not meaning anything...