c++gcctrigraphs

Why are string literals parsed for trigraph sequences in Gnu gcc/g++?


Consider this innocuous C++ program:

#include <iostream>
int main() {
  std::cout << "(Is this a trigraph??)" << std::endl;
  return 0;
}

When I compile it using g++ version 5.4.0, I get the following diagnostic:

me@my-laptop:~/code/C++$ g++ -c test_trigraph.cpp
test_trigraph.cpp:4:36: warning: trigraph ??) ignored, use -trigraphs to enable [-Wtrigraphs]
   std::cout << "(Is this a trigraph??)" << std::endl;
                                     ^

The program runs, and its output is as expected:

(Is this a trigraph??)

Why are string literals parsed for trigraphs at all?

Do other compilers do this, too?


Solution

  • Trigraphs were handled in translation phase 1 (they are removed in C++17, however). String literal related processing happens in subsequent phases. As the C++14 standard specifies (n4140) [lex.phases]/1.1:

    The precedence among the syntax rules of translation is specified by the following phases.

    1. Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical source file characters accepted is implementation-defined. Trigraph sequences ([lex.trigraph]) are replaced by corresponding single-character internal representations. Any source file character not in the basic source character set ([lex.charset]) is replaced by the universal-character-name that designates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e., using the \uXXXX notation), are handled equivalently except where this replacement is reverted in a raw string literal.)

    This happened first, because as you were told in comments, the characters that trigraphs stood for needed to be printable as well.