assemblyhexx86-64atttwos-complement

Subtraction producing a negative number in assembly - what is the hex value?


I'm very new to assembly so I don't know if this is a stupid question, but I have a question where I'm being asked to read the following line of assembly and say what value is stored in %rdi afterwards (the value stored in %rdi is currently 0x100):

subq 0x228, %rdi

My solution is 0xFFFFFFFFFFFFFF80.

I first got -128 hex by subtracting 0x228 from 0x100 and then took the 2's complement to get 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1000 0000, which I converted to hex.


My first question is if this is the right size value: the instruction has "q" at the end of it, meaning a quadword, so the answer should be 8 bytes, right?

My second question is if this is how assembly deals with a negative result: does it store the 2's complement form of the value?

I'm mainly unconfident in my answer because in every single other question I had an answer that was between 1 and 3 hex digits, so this one seems out of place.


Solution

  • In AT&T syntax, sub 0x228, %rdi has a memory source operand with an absolute addressing mode. You haven't said what's in the 8 bytes of memory at address 0x228. (Or if it's even mapped. Taking a #PF exception transfers control to the kernel's page-fault handler without modifying any registers except loading RSP from the TSS, and some segment registers.)
    What's difference between number with $ or without $ symbol in at&t assembly syntax?
    https://stackoverflow.com/tags/att/info


    When a subtraction produces a negative number

    If this was sub $0x228, %rdi (immediate operand), then you'd have the question you were trying to answer.

    Try it yourself with a debugger if you're not sure, or even C using uint64_t val = 0x100; val -= 0x228; and printing the result as hex.

    It's just a binary subtraction: unsigned and 2's complement interpretations of the bits will give the same result bit-pattern. That's kinda the point of 2's complement, that it doesn't need separate signed and unsigned add instructions.

    So no, the instruction doesn't negate the result (which is what taking the 2's complement of it would do) or anything like that. Internally it would typically handle subtraction like an adc $~0x228, %rdi with a carry-in of 1, and set CF according to the inverse of the carry-out from that. (In x86 unlike ARM, the carry flag after subtraction is a borrow flag, not inverted.) ~ is bitwise NOT, so add 0xfffffffffffffdd7 plus an extra 1, done via carry-in so it's still a single ALU operation.

    I first got -128 hex

    Correct.

    But then it looks like you reinterpreted your -0x128 as decimal -128 (-0x80) and found the bit-pattern for that.

    Wrapped to 64-bit, 0 - 0x128 is the same as (1<<64) - 0x128. That's what I usually do with a calculator program. (Like calc on my Linux desktop, which does arbitrary precision so I have to manually truncate to 64-bit or whatever.)

    Another way to get the 2's complement bit-pattern for -0x128 is to do a 2's complement negation of +0x128, e.g. using -x = ~x + 1 (How to prove that the C statement -x, ~x+1, and ~(x-1) yield the same results?). (Confusingly, "2's complement" is used as both the name for the format that can store signed numbers, and sometimes for the operation of negating.)

    The high bits of your result (leading 0xFF... bytes) are correct, but the low 2 bytes are wrong. The result is a small negative number, so sign-extended to 64-bit, a lot of high bits are set. That's normal.

    If your assignment wants the bit-pattern, the convention way to write that is with binary or unsigned hex. The mathematical value of 0x100 - 0x228 = -0x128 is of course -296, which you could also get from converting the inputs to decimal and doing 256 - 552 = -296 = -0x128. So that's an equally valid way to express the result in RDI.

    But note that it's unusual to use bases other than 10 with a minus sign for negative numbers, especially in computing where hex, octal, and binary are usually only used for bit-patterns / hexdumps rather than for mathematical values represented by some bits. Mathematically there's nothing wrong with -0x128, but it's only useful if you know what it means, and this assignment probably wants you to demonstrate that. :P


    if this is the right size value: the instruction has "q" at the end of it, meaning a quadword, so the answer should be 8 bytes, right?

    Correct.

    The q suffix is actually redundant, the 64-bit register RDI implies an operand size so you could write sub $0x228, %rdi.

    Suffixes are only necessary to imply an operand size when no operand is directly a register, e.g. sub $123, (%rbx, rax, 4) has immediate and memory operands, neither implying an operand-size. (Only a 64-bit address-size from RBX and RAX.) Or neg (%rdi) has just a memory operand.
    Not-so-fun fact: the GNU assembler will default to 32-bit operand-size for these, with only recent versions even printing a warning. Except with mov, then it's an error. This is dumb. Clang's built-in assembler correctly treats ambiguity as an error, like GAS in .intel_syntax noprefix mode.

    The most precise way to describe the contents of a 64-bit register would be to show the full bit-pattern in hex, including any leading zeros, like RDI = 0x0000000000000100 is your starting value before this subtraction. This represents a value of 0x100 = 256.
    0xFFFFFFFFFFFFFFFF represents a value of -0x1 = -1 as 2's complement, or 2^64 - 1 as unsigned.

    In terms of tracing program logic, it's often more convenient to just write the represented value without actually caring about its bit-pattern, e.g. just writing -296. We know that will have its sign-bit set, and is representable in RDI, EDI, and DI, but not DIL because 8-bit DIL can only hold a signed value-range of -128..+127.