clanguage-lawyerc11c17

Which sections of the C standard prove the relation between the integer type sizes?


In the late draft of C11 [C11_N1570] and C17 [C17_N2176] I fail to find the proof of the following (which, I believe, is commonly known):
sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)
Can anybody refer me to the particular sections?

I'm aware of this reply for C++11. The second part of the reply talks about C, but only touches the ranges of the values. It does not prove the ratio between the type sizes.


Solution

  • Thank you very much everybody who participated in the search of the answer. Most of the replies have shared what I have already learned, but some of the comments provided very interesting insight.
    Below I will summarize what I learned so far (for my own future reference).


    Conclusion

    Looks like C (as of late draft of C17 [C17_N2176]) does not guarantee that
    sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)
    (as opposed to C++).

    What is Guaranteed

    Below is my own interpretation/summary of what C does guarantee regarding the integer types (sorry if my terminology is not strict enough).

    Multiple Aliases For the Same Type

    This topic moves out of my way the multiple aliases for the same type ([C17_N2176], 6.2.5/4 parenthesized sentence referring to 6.7.2/2, thanks @M.M for the reference).

    The Number of Bits in a Byte

    The number of bits in a byte is implementation-specific and is >= 8. It is determined by CHAR_BIT identifier.
    5.2.4.2.1/1 Sizes of integer types <limits.h>

    Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

    number of bits for smallest object that is not a bit-field (byte)
    CHAR_BIT 8

    The text below assumes that the byte is 8 bits (keep that in mind on the implementations where byte has a different number of bits).

    The sizeof([[un]signed] char)

    sizeof(char), sizeof(unsigned char), sizeof(signed char), are 1 byte.
    6.5.3.4/2 The sizeof and _Alignof operators

    The sizeof operator yields the size (in bytes) of its operand

    6.5.3.4/4:

    When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

    The Range of the Values and the Size of the Type

    Objects may use not all the bits to store a value
    The object representation has value bits, may have padding bits, and for the signed types has exactly one sign bit (6.2.6.2/1/2 Integer types). E.g. the variable can have size of 4 bytes, but only the 2 least significant bytes may be used to store a value (the object representation has only 16 value bits), similar to how the bool type has at least 1 value bit, and all other bits are padding bits.

    The correspondence between the range of the values and the size of the type (or the number of value bits) is arguable.
    On the one hand @eric-postpischil refers to 3.19/1:

    value
    precise meaning of the contents of an object when interpreted as having a specific type

    This makes an impression that every value has a unique bit representation (bit pattern).

    On the other hand @language-lawyer states

    different values don't have to be represented by different bit patterns. Thus, there can be more values than possible bit patterns.

    when there is contradiction between the standard and a committee response (CR), committee response is chosen by implementors.

    from DR260 Committee Response follows that a bit pattern in object representation doesn't uniquely determine the value. Different values may be represented by the same bit pattern. So I think an implementation with CHAR_BIT == 8 and sizeof(int) == 1 is possible.

    I didn't claim that an object has multiple values at the same time

    @language-lawyer's statements make an impression that multiple values (e.g. 5, 23, -1), probably at different moments of time, can correspond to the same bit pattern (e.g. 0xFFFF) of the value bits of a variable. If that's true, then the integer types other than [[un]signed] char (see "The sizeof([[un]signed] char)" section above) can have any byte size >= 1 (they must have at least one value bit, which prevents them from having byte size 0 (paranoidly strictly speaking), which results in a size of at least one byte), and the whole range of values (mandated by <limits.h>, see below) can correspond to that "at least one value bit".

    To summarize, the relation between sizeof(short), sizeof(int), sizeof(long), sizeof(long long) can be any
    (any of these, in byte size, can be greater than or equal to any of the others. Again, somewhat paranoidly strictly speaking).

    What Does Not Seem Arguable
    What has not been mentioned is 6.2.6.2/1/2 Integer types:

    For unsigned integer types .. If there are N value bits, each bit shall represent a different power of 2 between 1 and 2^(N-1), so that objects of that type shall be capable of representing values from 0 to 2^N - 1 using a pure binary representation ..

    For signed integer types .. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type ..

    This makes me believe that each value bit adds a unique value to the overall value of the object. E.g. the least significant value bit (I'll call it a value bit number 0) (regardless of where in the byte(s) it is located) adds a value of 2^0 == 1, and no any other value bit adds that value, i.e. the value is added uniquely. The value bit number 1 (again, regardless of its position in the byte(s), but position different from the position of any other value bit) uniquely adds a value of 2^1 == 2.
    These two value bits together sum up to the overall absolute value of 1 + 2 == 3.

    Here I won't dig into whether they add a value when set to 1 or when cleared to 0 or combination of those. In the text below I assume that they add value if set to 1.

    Just in case I'll also quote 6.2.6.2/2 Integer types:

    If the sign bit is one, the value shall be modified in one of the following ways:
    ...
    — the sign bit has the value -(2^M) (two’s complement);

    Earlier in 6.2.6.2/2 it has been mentioned that M is the number of value bits in the signed type.
    Thus, if we are talking about 8-bit signed value with 7 value bits and 1 sign bit, then the sign bit, if set to 1, adds the value of -(2^M) == -(2^7) == -128.

    Earlier I considered an example where the two least significant value bits sum up to the overall absolute value of 3. Together with the sign bit set to 1 for the 8-bit signed value with 7 value bits, the overall signed value will be -128 + 3 == -125.
    As an example, that value can have a bit pattern of 0x83 (the sign bit is set to 1 (0x80), the two least significant value bits are set to 1 (0x03), and both value bits add to the overall value if are set to 1, rather than cleared to 0, in the two's complement representation).

    This observation makes me think that, very likely, there is a one-to-one correspondence between the range of values and the number of value bits in an object - every value has a unique pattern of value bits and every pattern of value bits uniquely maps to a single value.
    (I realize that this intermediate conclusion can still be not strict enough or wrong or not cover certain cases)

    Minimum Number of Value Bits and Bytes

    5.2.4.2.1/1 Sizes of integer types <limits.h>:
    Important sentence:

    Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

    Then:

    SHRT_MIN -32767 // -(2^15 - 1)
    SHRT_MAX +32767 // 2^15 - 1
    USHRT_MAX 65535 // 2^16 - 1

    This tells me that
    short int has at least 15 value bits (see SHRT_MIN, SHRT_MAX above), i.e. at least 2 bytes (if byte is 8 bits, see "The Number of Bits in a Byte" above).
    unsigned short int has at least 16 value bits (USHRT_MAX above), i.e. at least 2 bytes.

    Continuing that logic (see 5.2.4.2.1/1):
    int has at least 15 value bits (see INT_MIN, INT_MAX), i.e. at least 2 bytes.
    unsigned int has at least 16 value bits (see UINT_MAX), i.e. at least 2 bytes.
    long int has at least 31 value bits (see LONG_MIN, LONG_MAX), i.e. at least 4 bytes.
    unsigned long int has at least 32 value bits (see ULONG_MAX), i.e. at least 4 bytes.
    long long int has at least 63 value bits (see LLONG_MIN, LLONG_MAX), i.e. at least 8 bytes.
    unsigned long long int has at least 64 value bits (see ULLONG_MAX), i.e. at least 8 bytes.

    This proves to me that:
    1 == sizeof(char) < any of { sizeof(short), sizeof(int), sizeof(long), sizeof(long long) }.

    The sizeof(int)

    6.2.5/5 Types

    A "plain" int object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range INT_MIN to INT_MAX as defined in the header <limits.h>).

    This proves to me that:
    sizeof(int) == 4 on 32-bit architecture (if byte is 8 bits),
    sizeof(int) == 8 on 64-bit architecture (if byte is 8 bits).

    The sizeof(unsigned T)

    6.2.5/6 Types

    For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.

    This proves to me that:
    sizeof(unsigned T) == sizoef(signed T).

    The Ranges of Values

    6.2.5/8 Types

    For any two integer types with the same signedness and different integer conversion rank (see 6.3.1.1), the range of values of the type with smaller integer conversion rank is a subrange of the values of the other type.

    (See the discussion of 6.3.1.1 below)
    I assume that a subrange of values can contain the same or smaller number of values than the range. I.e. the type with the smaller conversion rank can have the same or smaller number of values than the type with the greater conversion rank.

    6.3.1.1/1 Boolean, characters, and integers

    — The rank of long long int shall be greater than the rank of long int, which shall be greater than the rank of int, which shall be greater than the rank of short int, which shall be greater than the rank of signed char.
    — The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type, if any.
    — The rank of _Bool shall be less than the rank of all other standard integer types.
    — The rank of any enumerated type shall equal the rank of the compatible integer type (see 6.7.2.2).

    This tells me that:
    range_of_values(bool) <= range_of_values(signed char) <= range_of_values(short int) <= range_of_values(int) <= range_of_values(long int) <= range_of_values(long long int).
    For the unsigned types the relation between the ranges of values is the same.

    This establishes the same relation for the number of value bits in the types.

    But still does not prove the same relation between the sizes in bytes of objects of those types.
    I.e. C (as of [C17_N2176]) does not guarantee the following statement (as opposed to C++):
    sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)