I am not sure about whether I have to cast a character to an unsigned char
before being compared to the return of a getc
family function.
The functions I consider getc
family, are getc
, fgetc
and getchar
I am only talking about single-byte characters.
Here is example without the cast:
#include <stdio.h>
int main(void) {
int c;
while ((c = getchar()) != '\n' && c != EOF) // loop until newline or EOF
putchar(c);
return 0;
}
Here is an example with the cast:
#include <stdio.h>
int main(void) {
int c;
while ((c = getchar()) != (unsigned char)'\n' && c != EOF) // loop until newline or EOF
putchar(c);
return 0;
}
On the implementation I use, both work.
Is the cast required for portable programs?
I believe yes, because C11/N1570 7.21.7.1p2, emphasis mine:
If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).
The C standard guarantees that character constants for these characters have nonnegative values:1
A
to Z
and a
to z
,0
to 9
,!
, "
, #
, %
, &
, ā
, (
, )
, *
, +
, ,
, -
, .
, /
, :
, ;
, <
, =
, >
, ?
, [
, \
, ]
, ^
, _
, {
, |
, }
, and ~
,This follows from several sections of the C standard:
char
is nonnegative.'x'
, containing a single character (including a single character resulting from an escape sequence, like '\n'
) is its value as a char
converted to int
.The nonnegative char
values are always a subset of the unsigned char
values, so each character constant of one of these characters will have the same value as the value returned by getc
when reading the same character.
If you need to handle other characters and cannot ensure those characters have nonnegative values in your target platforms, then you should convert the character constants to unsigned char
.
1 There is one pedantic exception to this which does not occur in practice. In a C implementation in which char
and int
are the same width and char
is unsigned, char
may have values not representable in int
. In this case, the conversion is implementation-defined, so it may produce negative values. This conversion would be the same for converting the unsigned char
value to int
for the character constant and for converting the unsigned char
getc
return value to int
, so they would produce the same value for the same characters. Conceivably, the conversion might be defined to clamp instead of wrap, which would make multiple characters map to the same value and be impossible to distinguish. This would be a defect in the C implementation, and there would not be a way to work around it using only the features fully specified by the C standard.