clanguage-lawyergetcharfgetcgetc

Does a character have to be casted to unsigned char before being compared to getc family returns?


I am not sure about whether I have to cast a character to an unsigned char before being compared to the return of a getc family function.

The functions I consider getc family, are getc, fgetc and getchar

I am only talking about single-byte characters.

Here is example without the cast:

#include <stdio.h>

int main(void) {
  int c;

  while ((c = getchar()) != '\n' && c != EOF) // loop until newline or EOF
    putchar(c);

  return 0;
}

Here is an example with the cast:

#include <stdio.h>

int main(void) {
  int c;

  while ((c = getchar()) != (unsigned char)'\n' && c != EOF) // loop until newline or EOF
    putchar(c);

  return 0;
}

On the implementation I use, both work.

Is the cast required for portable programs?

I believe yes, because C11/N1570 7.21.7.1p2, emphasis mine:

If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).


Solution

  • The C standard guarantees that character constants for these characters have nonnegative values:1

    This follows from several sections of the C standard:

    The nonnegative char values are always a subset of the unsigned char values, so each character constant of one of these characters will have the same value as the value returned by getc when reading the same character.

    If you need to handle other characters and cannot ensure those characters have nonnegative values in your target platforms, then you should convert the character constants to unsigned char.

    Footnote

    1 There is one pedantic exception to this which does not occur in practice. In a C implementation in which char and int are the same width and char is unsigned, char may have values not representable in int. In this case, the conversion is implementation-defined, so it may produce negative values. This conversion would be the same for converting the unsigned char value to int for the character constant and for converting the unsigned char getc return value to int, so they would produce the same value for the same characters. Conceivably, the conversion might be defined to clamp instead of wrap, which would make multiple characters map to the same value and be impossible to distinguish. This would be a defect in the C implementation, and there would not be a way to work around it using only the features fully specified by the C standard.