c++iolanguage-lawyeristreamchar-traits

Is it guaranteed that std::char_traits<char>::to_int_type(c) == static_cast<int>(c)?


The question How to use correctly the return value from std::cin.get() and std::cin.peek()? made me wonder if it is guaranteed that

std::char_traits<char>::to_int_type(c) == static_cast<int>(c)

for all valid char values c.


This comes up in a lot of places. For example, istream::peek calls streambuf::sgetc, which uses to_int_type to convert the char value into int_type. Now, does std::cin.peek() == '\n' really mean that the next character is \n?


Here's my analysis. Let's collect the pieces from [char.traits.require] and [char.traits.specializations.char]:

  1. For every int value e, to_char_type(e) returns

    • c, if ​eq_­int_­type(e, ​to_­int_­type(c)) for some c;

    • some unspecified value otherwise.

  2. For every pair of int values e and f, eq_­int_­type(e, f) returns

    • eq(c, d), if e == to_int_type(c) and f == to_int_type(d) for some c and d;

    • true, if e == eof() and f == eof();

    • false, if e == eof() xor f == eof();

    • unspecified otherwise.

  3. eof() returns a value e such that !eq_int_type(e, to_int_type(c)) for all c.

  4. eq(c, d) iff (unsigned char) c == (unsigned char) d.

Now, consider this hypothetical implementation: (syntactically simplified)

//          char: [-128, 127]
// unsigned char: [0, 255]
//           int: [-2^31, 2^31-1]

#define EOF INT_MIN

char to_char_type(int e) {
    return char(e - 1);
}

int to_int_type(char c) {
    return int(c) + 1;
}

bool eq(char c, char d) {
    return c == d;
}

bool eq_int_type(int c, int d) {
    return c == d;
}

int eof() {
    return EOF;
}

Note that

Now let's verify the requirements:

  1. For every int value e, if ​eq_­int_­type(e, ​to_­int_­type(c)) for some c, then e == int(c) + 1. Therefore, to_char_type(e) == char(int(c)) == c.

  2. For every pair of int values e and f, if e == to_int_type(c) and f == to_int_type(d) for some c and d, then eq_int_type(e, f) iff int(c) + 1 == int(d) + 1 iff c == d (by property 1). The EOF cases are also trivially verifiable.

  3. For every char value c, int(c) >= -128, so int(c) + 1 != EOF. Therefore, !eq_int_type(eof(), to_int_type(c)).

  4. For every pair of char values c and d, eq(c, d) iff (unsigned char) c == (unsigned char d) (by property 2).

Does that mean this implementation is conforming, and therefore std::cin.peek() == '\n' does not do what it is supposed to do? Did I miss anything in my analysis?


Solution

  • Does that mean this implementation is conforming, and therefore std::cin.peek() == '\n' does not do what it is supposed to do?

    I agree with your analysis. This isn't guaranteed.

    It appears that you would have to use eq_­int_­type(std::cin.peek(), ​to_­int_­type('\n')) to guarantee correct result.


    P.S. Your ​to_­char_­type(EOF) has undefined behaviour due to signed overflow in INT_MIN - 1. Sure, the value is unspecified in this case, but you still cannot have UB. This would be valid:

    char to_char_type(int e) {
        return e == EOF
             ? 0 // doesn't matter
             : char(e - 1);
    }
    

    to_int_type would have UB on systems where int and char are same size in case c == INT_MAX, but you've excluded those systems with the hypothetical sizes.