[SOLVED] Why does some Windows booloader code zero registers with `sub` instead of `xor`?

Why does some Windows booloader code zero registers with `sub` instead of `xor`?

Given considerations such as detailed in https://stackoverflow.com/a/33668295, it seems xor reg, reg is the best way to zero a register. But when I examine real-world assembly code (such as Windows bootloader code, IIRC), I see both xor reg, reg and sub reg, reg used.

Why is sub used at all for this purpose? Are there any reasons to prefer sub in some special cases? For example, does it set flags differently from xor?

Solution

Differences:

sub reg,reg is documented to set AF=0 (the BCD half-carry flag, from bit 3 to bit 4). XOR leaves AF undefined. The architectural effect is otherwise exactly identical, leaving only possible performance differences. AF almost never matters, usually only if the next instruction is aaa or something.
sub-zeroing is slower than xor-zeroing on a few CPUs (e.g. Silvermont, as pointed out in my answer you linked), but the same performance on most. And of course both have the same 2-byte size.

I'd guess it's just different authors of hand-written asm, some of them preferring sub probably without realizing that some CPUs only special-case xor. Except in cases where they want to guarantee clearing the AF flag, where sub might be intentional. Like perhaps initializing things and wanting a fully known state for EFLAGS before something that might use pushf.

XOR leaving AF undefined still means it will be either 0 or 1, you just don't know which. (Not like C undefined behaviour). The actual result could depend on the CPU model, the input values, or possibly even some stray bits somewhere.

In modern CPUs that recognize sub as a zeroing idiom, it will be zero so the CPU can handle xor-zeroing and sub-zeroing exactly identically, including the FLAGS result.