I am trying to use and display French accented characters in my C++20 program.
However, using std::getline()
to read inside a file seems to mess up accented characters, like so:
#include <locale>
#include <iostream>
#include <fstream>
int main(void)
{
setlocale(LC_ALL,"");
std::wifstream file("test.txt");
std::wstring s;
std::getline(file, s);
std::wcout << s << std::endl;
return 0;
}
Content of test.txt (encoded in UTF-8):
Salut ! Comment ça va ? éèêëâàäáôûöüùîï
Result:
$>./test
Salut ! Comment ça va ? éèêëâà äáôûöüùîï
However, when I try to display the same text as a const std::wstring
, the result has no problem:
#include <locale>
#include <iostream>
int main(void)
{
setlocale(LC_ALL,"");
std::wstring s = L"Salut ! Comment ça va ? éèêëâàäáôûöüùîï";
std::wcout << s << std::endl;
return 0;
}
Result:
$>./test
Salut ! Comment ça va ? éèêëâàäáôûöüùîï
Using setlocale(LC_ALL, "")
has made the problem better, as before even the second example would not work, but there seems to be a problem with std::getline()
that I don't get.
I read that I might need to imbue a locale into the std::wifstream
, but I could not understand how to make it work.
I'm fairly new to C++, so I'm not sure if there are better tools for this kind of problem, at least I couldn't find any.
I'm using zsh on MinGW, integrated into VSCode.
I compile with the following command:
c++ -Wall -Wextra -Werror -std=c++20 test.cpp -o test
I was able to solve this problem thanks to this post!
Imbuing was the solution, here is what solved my problem:
#include <locale>
#include <codecvt>
#include <iostream>
#include <fstream>
int main(void)
{
setlocale(LC_ALL,"");
std::wifstream file("test.txt");
file.imbue(std::locale(std::locale(), new std::codecvt_utf8<wchar_t,0x10ffff, std::consume_header>));
std::wstring s;
std::getline(file, s);
std::wcout << s << std::endl;
return 0;
}
This line:
file.imbue(std::locale(std::locale(), new std::codecvt_utf8<wchar_t,0x10ffff, std::consume_header>));
was originally:
file.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t,0x10ffff, std::consume_header>));
However, std::locale::empty()
is platform-specific as seen in this SO question so I replaced it by std::locale()
and it worked fine.