clanguage-lawyernegative-numbertwos-complement

Representation of negative numbers in C?


How does C represent negative integers?

Is it by two's complement representation or by using the MSB (most significant bit)?

-1 in hexadecimal is ffffffff.

So please clarify this for me.


Solution

  • ISO C (C99 section 6.2.6.2/2 in this case but it carries forward to later iterations of the standard(a)) states that an implementation must choose one of three different representations for integral data types, two's complement, ones' complement or sign/magnitude (although it's incredibly likely that the two's complement implementations far outweigh the others).

    In all those representations, positive numbers are identical, the only difference being the negative numbers.

    To get the negative representation for a positive number, you:

    You can see this in the table below:

    number | two's complement    | ones' complement    | sign/magnitude
    =======|=====================|=====================|====================
         5 | 0000 0000 0000 0101 | 0000 0000 0000 0101 | 0000 0000 0000 0101
        -5 | 1111 1111 1111 1011 | 1111 1111 1111 1010 | 1000 0000 0000 0101
    

    Keep in mind that ISO doesn't mandate that all bits are used in the representation. They introduce the concept of a sign bit, value bits and padding bits. Now I've never actually seen an implementation with padding bits but, from the C99 rationale document, they have this explanation:

    Suppose a machine uses a pair of 16-bit shorts (each with its own sign bit) to make up a 32-bit int and the sign bit of the lower short is ignored when used in this 32-bit int. Then, as a 32-bit signed int, there is a padding bit (in the middle of the 32 bits) that is ignored in determining the value of the 32-bit signed int. But, if this 32-bit item is treated as a 32-bit unsigned int, then that padding bit is visible to the user’s program. The C committee was told that there is a machine that works this way, and that is one reason that padding bits were added to C99.

    I believe that machine they may have been referring to was the Datacraft 6024 (and it's successors from Harris Corp). In those machines, you had a 24-bit word used for the signed integer but, if you wanted the wider type, it strung two of them together as a 47-bit value with the sign bit of one of the words ignored:

    +---------+-----------+--------+-----------+
    | sign(1) | value(23) | pad(1) | value(23) |
    +---------+-----------+--------+-----------+
    \____________________/ \___________________/
          upper word            lower word
    

    (a) Interestingly, given the scarcity of modern implementations that actually use the other two methods, there's been a push to have two's complement accepted as the one true method. This has gone quite a long way in the C++ standard (WG21 is the workgroup responsible for this) and is now apparently being considered for C as well (by WG14).