Depending on the environment and compiler settings, the type char
can be signed or unsigned by default, which means the range of values for single character constants on 8-bit 2s complement systems can be either -128..127
or 0..255
.
In the ubiquitous ASCII character set, its ISO-8859-X extensions or the UTF-8 encoding, upper- and lowercase letters as well as digits have values below 127.
But such is not the case with the EBCDIC character set:
'A'
is 0xC1, 'a'
is 0x81 and '1'
is 0xF1.
Since these value are above 127, does it mean the type char
must be unsigned on 8-bit EBCDIC systems? Or can 'a'
, 'A'
and '1'
have negative values?
What about other character sets? Can the letters or digits ever have negative values?
C99 states that:
6.2.5 Types
An object declared as type char is large enough to store any member of the basic execution character set.
If a member of the basic execution character set is stored in a char its value is guaranteed to be nonnegative.
Thus, if the machine in question uses EBCDIC encoding and 8-bit char
, then the C99 compliant compiler designed for this machine must have plain char
be unsigned.