ccharacter-encodinglanguage-lawyer

Do 'a' and '0' always have positive values even if char is signed?


Depending on the environment and compiler settings, the type char can be signed or unsigned by default, which means the range of values for single character constants on 8-bit 2s complement systems can be either -128..127 or 0..255.

In the ubiquitous ASCII character set, its ISO-8859-X extensions or the UTF-8 encoding, upper- and lowercase letters as well as digits have values below 127.

But such is not the case with the EBCDIC character set:

'A' is 0xC1, 'a' is 0x81 and '1' is 0xF1.

Since these value are above 127, does it mean the type char must be unsigned on 8-bit EBCDIC systems? Or can 'a', 'A' and '1' have negative values?

What about other character sets? Can the letters or digits ever have negative values?


Solution

  • C99 states that:

    6.2.5 Types

    An object declared as type char is large enough to store any member of the basic execution character set.

    If a member of the basic execution character set is stored in a char its value is guaranteed to be nonnegative.

    Thus, if the machine in question uses EBCDIC encoding and 8-bit char, then the C99 compliant compiler designed for this machine must have plain char be unsigned.