I'm using Windows 11. I have a program "Hello.exe"
#include <iostream>
int main(int argc, char* argv[])
{
for (int i = 0; i < argc; i++)
{
std::cout << argv[i] << std::endl;
}
}
If I pass in a Japanese UTF-8 character to this program
Hello.exe う
Then nothing is printed. And strangely, the content of this character, as recorded in argv, is 3f
. But the actual encoding of this character should be e3 81 86
.
What I've tried
(1) However, if I directly print this character in my code, the encoding would be correct in memory, and the character can be printed to stdout.
SetConsoleOutputCP(CP_UTF8);
printf("う")
(2) I also tried using wmain
instead of main
, can't be printed either. The value stored in argv is 46 30
#include <iostream>
int wmain(int argc, wchar_t** argv)
{
for (int i = 0; i < argc; i++)
{
std::wcout << argv[i] << std::endl;
}
}
(3) I also wrote a Python program, which does the same thing, and the character can be printed.
What am I missing?
Windows is using UTF-16 encoded text everywhere it expects strings. This makes implementation of cross-platform programs more difficult since typically other operating systems use UTF-8 as their preferred Unicode encoding. But the good news is that it is now possible to use UTF-8 in Windows applications as well.
Windows 10 since May 2019 (version 1903), and Windows 11 of course, support UTF-8 codepage. With help of a manifest file that needs to be embedded in the .exe
file, the developper can tell Windows system to set UTF-8 codepage when running the application.
The manifest file is typically that file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
<assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
<application>
<windowsSettings>
<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
</windowsSettings>
</application>
</assembly>
You use mt.exe
to add the manifest to the executable, or add the file as manifest in .vsproj
on Visual Studio
Microsoft compiler (MSVC) needs flag /utf-8
to let it know that the source files are encoded in UTF-8 and that you want to output text as UTF-8. Don't forget that flag in projects.
For Windows console applications, call at start of main
function SetConsoleOutputCP(CP_UTF8);
for output and SetConsoleCP(CP_UTF8);
for input. This is curiously required even with the manifest, as the console defaults to Windows OEM locale and not UTF-8.
BUG: from my experiments, it seems that on Windows 10, inputting UTF-8 from the console does not work, whatever you try, except if somehow you call ReadConsoleW
manually and adjust. On Windows 11, however, it works.
Windows API functions exist in two flavors. There are functions ending in A
(for ANSI) that expect const char*
zero-terminated strings, and there are those ending in W
(for wide) that expect const wchar_t*
zero-terminated strings. The type wchar_t
is 16-bit wide on Windows, and the wide strings are expected to be UTF-16LE encoded.
Since you enabled UTF-8 as application codepage, you don't want to use the W
wide API, but the A
ANSI functions. So, although you actually want to support Unicode, don't define neither _UNICODE
nor UNICODE
macros as those would select the W
variant of API. Alternately, in Visual Studio, select Use Multi-Byte Character Set
for the Character Set
parameter (in Advanced
configuration properties).
Then you can also use the Unicode agnostic macros like MessageBox
that will properly select MessageBoxA
.
There are unfortunately some rare Windows API that do only exist in UTF-16 (wchar_t*
) version. For those, you will need to manually convert your UTF-8 string into UTF-16 for example with std::codecvt
or MultiByteToWideChar.
Here is a Hello World demonstration
Hello-UTF-8.cpp
: must be stored with UTF-8 encoding. BOM is permitted, but not recommended.
#define _CRT_SECURE_NO_WARNINGS
#include <Windows.h>
#include <iostream>
#include <string>
#include <cstdio>
int main(int argc, char* argv[])
{
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
std::string str = "議論\n";
for(int i=0; i<argc; i++)
{
str += argv[i];
str += "\n";
}
std::cout << str;
FILE* file = fopen("Деякий файл.txt", "wt");
fputs(str.c_str(), file);
MessageBox(nullptr, str.c_str(), "Γεια σου κόσμε", MB_OK);
}
utf8.manifest
: exactly as above (I don't care about the dummy name
):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
<assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
<application>
<windowsSettings>
<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
</windowsSettings>
</application>
</assembly>
Compiled and run on PowerShell (for proper Unicode handling):
PS E:\Привет> cl Hello-UTF-8.cpp /utf-8 /nologo User32.lib /EHsc
Hello-UTF-8.cpp
PS E:\Привет> mt -nologo -manifest utf8.manifest -outputresource:Hello-UTF-8.exe;#1
PS E:\Привет> .\Hello-UTF-8.exe こんにちは κόσμος
議論
E:\Привет\Hello-UTF-8.exe
こんにちは
κόσμος
PS E:\Привет> dir *.txt
Répertoire : E:\Привет
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 20.06.2025 11:11 72 Деякий файл.txt