assemblyx86x86-64eflags

Arithmetic identities and EFLAGS


Since −x = not(x)+1 which then implies a-b = a+not(b)+1, would then

sub rax, rcx

be equivalent to

mov temp, rcx
not temp
add rax, temp
add rax, 1

where temp is some register considered to be volatile?

In other words, does the latter affect EFLAGS in the exact same way? If not, how can it be forced to?


Solution

  • Yes, that gets the same integer result in RAX.

    In other words, does the latter affect EFLAGS in the exact same way?

    Of course not. ZF, SF, and PF only depend on the integer result, but CF and OF1 depend on how you get there. x86's CF carry flag is a borrow output from subtraction. (Unlike some ISAs such as ARM, where subtraction sets the carry flag if there was no borrow.)

    Trivial counterexample you could check in your head:
    0 - 1 with sub sets CF=1. But your way clears CF.

    mov temp, rcx        # no effect on FLAGS
    not temp             # no effect on FLAGS, unlike most other x86 ALU instructions
    add rax, ~1 = 0xFF..FE     # 0 + anything  clears CF
    add rax, 1                 # 0xFE + 1 = 0xFF..FF = -1.  clears CF
    

    (Fun fact: not doesn't affect FLAGS, unlike most other ALU instructions including neg. neg sets flags the same as sub from 0. A strange quirk of x86 history. https://www.felixcloutier.com/x86/not#flags-affected)

    Footnote 1: so does AF, the half-carry flag (auxiliary) from the low to high nibble in the low byte. You can't branch on it directly, and x86-64 removed the BCD instructions like aaa that read it, but it's still there in RFLAGS where you can read it with pushf / pop rax for example.

    If not, how can it be forced to?

    Use different instructions. The easiest and most efficient way to get the desired effect on EFLAGS would be to optimize it back to sub rax, rcx. That's why x86 has sub and sbb instructions. If that's what you want, use it.


    If you want an alternative, you definitely need to avoid something like add rax,1 as the last step. That would set CF only if the final result is zero, wrapping from ULONG_MAX = -1.

    Doing x -= y as x += -y works for OF in most cases. (But not the most-negative number y=LONG_MIN (1UL<<63), where neg rcx would overflow).

    But CF tells you about the 65-bit full result of 64 + 64-bit addition or subtraction. 64-bit negation isn't sufficient: x += -y doesn't always set CF opposite of what x -= y would.

    Possibly something involving neg / sbb could be useful? But no, that treats carry-out from negation as -0 / -1, not -(1<<64).

    # Broken attempt that fails for CF when rcx=0 at least, probably many more cases.
    # Also fails for OF for rcx=0x8000000000000000 = LONG_MIN
    mov temp, rcx        # no effect on FLAGS
    neg temp             # or NOT + INC  if you insist on avoiding sub-like operations
    add rax, temp        # x += -y
    cmc                  # complement carry.  CF = !CF
    

    Notice that we combine x and y in a single step. Your add rax, 1 at the end steps on the earlier CF result, making it even less likely / possible for CF to be what you want.

    Signed-overflow (OF) has a corner case. It would be the same for most inputs, where the signed arithmetic operation is the same for x -= y or x += -y. But if -y overflows to still be negative (the most-negative 2's complement number has no inverse), it's adding a negative instead of subtracting a negative.

    e.g. -LONG_MIN == LONG_MIN because of signed overflow. (C notation; signed overflow is UB in ISO C, but in asm it wraps).

    Counterexample for this attempt for CF:

    -1 - 0 doesn't borrow, so CF=0. -1 + -0 = -1 + 0 doesn't carry either, and then CMC will flip CF to 1

    But -1 (0xff...ff) plus any other number does carry-out, while -1 minus any number doesn't.


    So it's not easy, and probably not very interesting to emulate the borrow output of sub accurately.

    Note that hardware ALUs often use something like a binary Adder–subtractor that muxes A or ~A as an input to full-adders in a carry/borrow aware way to implement A + B or A - B with a correct borrow output for subtraction.

    It should be possible to use stc / adc dst, inverted_src in asm to replicate what hardware like that actually does: addition of the inverse with a carry-in of 1. Not separately adding 1.

    (TODO: rewrite more of this answer to show using not / stc / adc instead of multiple operations that potentially need to propagate carry all the way through the number).

    Related: