I'm facing an issue while attempting to fetch all filenames from a directory. The problem arises when handling certain strings, resulting in errors. Below is the code snippet:
#include <filesystem>
int main()
{
const char* dir = "D:\\Music";
std::vector<std::string> musicList;
for (const auto& entry : std::filesystem::recursive_directory_iterator(dir))
{
if (entry.is_regular_file())
{
musicList.emplace_back(entry.path().string());
}
}
}
The issue occurs at entry.path().string()
when processing strings like L"D:\\Music\\suki\\Angel Note - 月明かりは優しく・・・.mp3"
. The program terminates with an error pointing to:
_STD_BEGIN
// We would really love to use the proper way of building error_code by specializing
// is_error_code_enum and make_error_code for __std_win_error, but because:
// 1. We would like to keep the definition of __std_win_error in xfilesystem_abi.h
// 2. and xfilesystem_abi.h cannot include <system_error>
// 3. and specialization of is_error_code_enum and overload of make_error_code
// need to be kept together with the enum (see limerick in N4950 [temp.expl.spec]/8)
// we resort to using this _Make_ec helper.
_NODISCARD inline error_code _Make_ec(__std_win_error _Errno) noexcept { // make an error_code
return { static_cast<int>(_Errno), _STD system_category() };
}
[[noreturn]] inline void _Throw_system_error_from_std_win_error(const __std_win_error _Errno) {
_THROW(system_error{ _Make_ec(_Errno) }); // Here occur error!
}
_STD_END
I compiled the code in Visual Studio 2022, and the C++ standard is C++17.
Upon investigation, I simplified the issue with:
#include <filesystem>
int main()
{
std::filesystem::path path = L"・";
auto str = path.string();
}
Similar issues arose at path.string()
. Upon further simplification using L"\u30FB"
, I discovered the character ・
is represented as "\u30FB"
.
While path.wstring()
, path.u8string()
, and other string conversions work well, I need a char*
for APIs such as ImGui::Text(str)
or FMOD's API. Attempts to convert wstring
to string
using codecvt
, Win32 API, or ICU resulted in garbled text like "・"
:
#include <filesystem>
#include <Windows.h>
std::string ws2s(const std::wstring& wstr)
{
int len = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), -1, nullptr, 0, nullptr, nullptr);
std::string str;
str.reserve(len);
WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), -1, str.data(), len, nullptr, nullptr);
return str;
}
int main()
{
std::filesystem::path path = L"\u30FB";
auto str = ws2s(path.wstring());
}
The resulting str was "・"
instead of "\u30FB"
.
Is there a reliable method to handle this situation effectively?
Okay, I found the issue. It's the encoding used in the VS debug interface. It doesn't display UTF-8 properly. For example, the contents of my vector<string>
are in UTF-8, but the debug interface shows garbled text, like the content of vec[0]
. All I need to do is append ,s8
to vec[0]
in the watch window. This forces the debug display to show UTF-8 content correctly.
Oh Microsoft, why do you insist on UTF-16? Isn't UTF-8 good enough?
The program crashes since std::filesystem::path::string
throws an exception and your code does not catch it.
This is a problem with encoding. Add this at the beginning of your program and the issue should be resolved:
static constexpr char localeName[] = "ja_JP.utf-8";
// Instruct the C standard library that Japanese will be used with UTF-8 encoding
std::setlocale(LC_ALL, localeName);
// Instruct the C++ standard library that Japanese will be used with UTF-8 encoding, for example in std::string, std::ostream
std::locale::global(std::locale(localeName));
// Use the system locale (language and encoding) when printing data to std::cout
// Note that if your system is using a different encoding than UTF-8, like CP932, the C++ standard library will implicitly do a conversion.
std::cout.imbue(std::locale{""});
I had a similar problem with boost::filesystem::path
and this resolved the issues.
Note that the encoding part is most important. On MSVC, this should address the issue too:
static constexpr char localeName[] = ".utf-8";
Here is full demo, with this code:
#include <iostream>
#include <filesystem>
#include <locale>
#define LOG(x) std::cout << #x " = " << x << '\n'
int main()
{
std::locale::global(std::locale{".utf-8"});
// use system encoding - language neutral
std::locale sysLoc{std::locale{"C"}, "", std::locale::ctype};
std::cout.imbue(sysLoc);
std::cerr.imbue(sysLoc);
for (const auto& dir_en : std::filesystem::directory_iterator{"."})
{
LOG(dir_en);
LOG(dir_en.path());
LOG(dir_en.path().string());
std::cout << "---------------\n";
}
}
I got this results:
C:\Users\marekR22\Downloads\MyDir>dir
Volume in drive C has no label.
Volume Serial Number is 5608-EF1A
Directory of C:\Users\marekR22\Downloads\MyDir
07/16/2024 03:58 PM <DIR> .
07/16/2024 03:58 PM <DIR> ..
07/16/2024 03:56 PM 47 Angel Note - 月明かりは優しく・・・.txt
07/16/2024 03:55 PM 526 main.cpp
2 File(s) 573 bytes
2 Dir(s) 7,074,545,664 bytes free
C:\Users\marekR22\Downloads\MyDir>cl /std:c++20 /EHcs /O2 /D NDEBUG /utf-8 main.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.39.33523 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
main.cpp
Microsoft (R) Incremental Linker Version 14.39.33523.0
Copyright (C) Microsoft Corporation. All rights reserved.
/out:main.exe
main.obj
C:\Users\marekR22\Downloads\MyDir>chcp
Active code page: 852
C:\Users\marekR22\Downloads\MyDir>main.exe
dir_en = ".\\Angel Note - ????????.txt"
dir_en.path() = ".\\Angel Note - ????????.txt"
dir_en.path().string() = .\Angel Note - ????????.txt
---------------
dir_en = ".\\main.cpp"
dir_en.path() = ".\\main.cpp"
dir_en.path().string() = .\main.cpp
---------------
dir_en = ".\\main.exe"
dir_en.path() = ".\\main.exe"
dir_en.path().string() = .\main.exe
---------------
dir_en = ".\\main.obj"
dir_en.path() = ".\\main.obj"
dir_en.path().string() = .\main.obj
---------------
C:\Users\marekR22\Downloads\MyDir>chcp 65001
Active code page: 65001
C:\Users\marekR22\Downloads\MyDir>main.exe
dir_en = ".\\Angel Note - 月明かりは優しく・・・.txt"
dir_en.path() = ".\\Angel Note - 月明かりは優しく・・・.txt"
dir_en.path().string() = .\Angel Note - 月明かりは優しく・・・.txt
---------------
dir_en = ".\\main.cpp"
dir_en.path() = ".\\main.cpp"
dir_en.path().string() = .\main.cpp
---------------
dir_en = ".\\main.exe"
dir_en.path() = ".\\main.exe"
dir_en.path().string() = .\main.exe
---------------
dir_en = ".\\main.obj"
dir_en.path() = ".\\main.obj"
dir_en.path().string() = .\main.obj
---------------
C:\Users\marekR22\Downloads\MyDir>chcp 932
Active code page: 932
C:\Users\marekR22\Downloads\MyDir>main.exe
dir_en = ".\\Angel Note - 月明かりは優しく・・・.txt"
dir_en.path() = ".\\Angel Note - 月明かりは優しく・・・.txt"
dir_en.path().string() = .\Angel Note - 月明かりは優しく・・・.txt
---------------
dir_en = ".\\main.cpp"
dir_en.path() = ".\\main.cpp"
dir_en.path().string() = .\main.cpp
---------------
dir_en = ".\\main.exe"
dir_en.path() = ".\\main.exe"
dir_en.path().string() = .\main.exe
---------------
dir_en = ".\\main.obj"
dir_en.path() = ".\\main.obj"
dir_en.path().string() = .\main.obj
---------------
Note that when my code page do not support Japanese characters ?
is printed (no crash). After I've change code page to 65001 (which represent UTF-8) proper Japanese characters are printed. It also works perfectly when Japanese code page 932 is used.