c++visual-studio-2008utf-8windows-xpconsole

UTF-8 output on Windows console


The following code shows unexpected behaviour on my machine (tested with Visual C++ 2008 SP1 on Windows XP and VS 2012 on Windows 7):

#include <iostream>
#include "Windows.h"

int main() {
    SetConsoleOutputCP( CP_UTF8 );
    std::cout << "\xc3\xbc";
    int fail = std::cout.fail() ? '1': '0';
    fputc( fail, stdout );
    fputs( "\xc3\xbc", stdout );
}

I simply compiled with cl /EHsc test.cpp.

Windows XP: Output in a console window isü0ü (translated to Codepage 1252, originally shows some line drawing characters in the default Codepage, perhaps 437).

When I change the settings of the console window to use the "Lucida Console" character, set and run my test.exe again, the output is changed to , which means:

Windows 7: Output using Consolas is ��0ü. Even more interesting. The correct bytes are written, probably (at least when redirecting the output to a file) and the stream state is ok, but the two bytes are written as separate characters.

I tried to raise this issue on "Microsoft Connect" (see [here), but MS has not been very helpful. You might as well look here as something similar has been asked before.

Can you reproduce this problem?

What am I doing wrong? Shouldn't the std::cout and the fputs have the same effect?


Solution

  • It's time to close this now. Stephan T. Lavavej says the behaviour is "by design", although I cannot follow this explanation.

    My current knowledge is: Windows XP console in UTF-8 codepage does not work with C++ iostreams.

    Windows XP is getting out of fashion now and so does VS 2008. I'd be interested to hear if the problem still exists on newer Windows systems.

    On Windows 7 the effect is probably due to the way the C++ streams output characters. As seen in an answer to Properly print utf8 characters in windows console, UTF-8 output fails with C stdio when printing one byte after after another like putc('\xc3'); putc('\xbc'); as well. Perhaps this is what C++ streams do here.