c++c++11unicodec++20unicode-string

Why trying to print unicode encoded strings with cout leads to compilation error in newer C++ standards?


I tried the following printing of Unicode characters with Visual C++ 2022 Version 17.4.4 with C++ standard set to the latest.

#include <iostream>

using namespace std;

int main()
{
  cout << u8"The official vowels in Danish are: a, e, i, o, u, \u00E6, \u00F8, \u00E5 and y.\n";
  return 0;
}

I have the compilation error:

1>C:\projects\cpp\test\test.cpp(7,8): error C2280: 'std::basic_ostream<char,std::char_traits<char>> &std::operator <<<std::char_traits<char>>(std::basic_ostream<char,std::char_traits<char>> &,const char8_t *)': attempting to reference a deleted function
1>C:\projects\cpp\test\test.cpp(7,8): error C2088: '<<': illegal for class

The same behavior is observed with u (utf-16) and U (utf-32) string literals.

Setting the standard to C++17 or C++14 makes the program to compile.

What is the rationale for disallowing this code in C++20 and later standards and what is the correct way to print Unicode string literals in those standards?


Solution

  • Until C++20, u8"..." was const char[N]. Since C++20, it is now const char8_t[N].

    std::cout is a std::basic_ostream<char>, and thus can't output char8_t data since C++20.

    The possible work around:

    std::basic_ostream<char>& operator<<(std::basic_ostream<char>& cout, const char8_t* s) {
      cout << reinterpret_cast<const char*>(s);
      return cout;
    }
    
    // Output: The official vowels in Danish are: a, e, i, o, u, æ, ø, å and y.