I'm upgrading my compiler from g++ 8.2.0 to 12.1.0.
I have some legacy code using std::filesystem
that starts failing with this upgrade.
I isolated the problem in a MCVE below. The code is supposed to work under Windows and Linux. It creates a file with UTF-8 characters in name and tries to copy it.
Linux error is:
COPY RAISED filesystem error: Cannot convert character sequence: Invalid or incomplete multibyte or wide character
COPY FAILED
Here's the program:
#include <iostream>
#include <filesystem>
#include <fstream>
#include <assert.h>
#include <codecvt>
#include <locale>
bool DoCopyFile( const std::wstring& sSource, const std::wstring& sDest )
{
try
{
std::error_code ec;
auto options = std::filesystem::copy_options::none;
std::filesystem::copy( sSource.c_str(), sDest.c_str(), options, ec );
if ( ec.value() == 0 )
{
return true;
}
std::cout << "COPY FAILED" << std::endl;
}
catch ( std::exception& e )
{
std::cout << "COPY RAISED " << e.what() << std::endl;
}
catch (...)
{
std::cout << "COPY RAISED" << std::endl;
}
return false;
}
static std::wstring Utf8string2wstring(const std::string& str)
{
try
{
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
return converter.from_bytes(str);
}
catch(...)
{
assert( false );
return std::wstring( str.begin(), str.end() );
}
}
#ifdef WIN32
static std::wstring Utf8string2wstringForStd(const std::string& str)
{
// Using Visual Studio, even with /utf-8 option set, trying to call std::fstream::open with UTF-8 string ends up
// with file creating file "ééé.txt" instead of "ééé.txt"
return Utf8string2wstring(str);
}
#else
static std::string Utf8string2wstringForStd(const std::string& str)
{
// Linux, OK to use UTF8 for std::fstream:
return str;
}
#endif
int main()
{
std::string dir = std::filesystem::current_path().string();
std::string fileName = dir + "/ééé.txt";
std::fstream file;
file.open(Utf8string2wstringForStd(fileName), std::ios::out );
if ( file.is_open() )
{
file << "Hello" << std::endl;
file.close();
std::string copy = fileName + ".copy";
if ( !DoCopyFile( Utf8string2wstring( fileName ), Utf8string2wstring( copy ) ) )
std::cout << "COPY FAILED" << std::endl;
else
std::cout << "COPY SUCCEEDED" << std::endl;
}
return 0;
}
Note: The C++ file is encoded in UTF8.
Is this code doing something wrong? I suspect Utf8string2wstring
...under Windows I get compiler warnings saying it uses deprecated functions, but it works fine.
I found topics reporting errors "Invalid or incomplete multibyte or wide character", but more at compile time, not at runtime..
I shortened the reproducer to just fs::exists(fs::path(L"abcé.txt"));
and started checking versions in Compiler explorer. GCC 12.3 seems to be the first release where this is no longer an error. I could not find a bug report about it in the GCC release notes, however.
For Clang it is fixed as of 17.0.1.