c++encodingutf-8

using std::filesystem with g++ 12.1.0 raises "Cannot convert character sequence: Invalid or incomplete multibyte or wide character"


I'm upgrading my compiler from g++ 8.2.0 to 12.1.0.

I have some legacy code using std::filesystem that starts failing with this upgrade.

I isolated the problem in a MCVE below. The code is supposed to work under Windows and Linux. It creates a file with UTF-8 characters in name and tries to copy it.

Linux error is:

COPY RAISED filesystem error: Cannot convert character sequence: Invalid or incomplete multibyte or wide character
COPY FAILED

Here's the program:

#include <iostream>
#include <filesystem>
#include <fstream>
#include <assert.h>
#include <codecvt>
#include <locale>

bool DoCopyFile( const std::wstring& sSource, const std::wstring& sDest )
{
    try
    {
        std::error_code ec;
        auto options = std::filesystem::copy_options::none;
        std::filesystem::copy( sSource.c_str(), sDest.c_str(), options, ec );
        if ( ec.value() == 0 )
        {
            return true;
        }
        std::cout << "COPY FAILED" << std::endl;
    }
    catch ( std::exception& e )
    {
        std::cout << "COPY RAISED " << e.what() << std::endl;
    }
    catch (...)
    {       
        std::cout << "COPY RAISED" << std::endl;
    }    

    return false;
}

static std::wstring Utf8string2wstring(const std::string& str)
{
    try
    {
        std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
        return converter.from_bytes(str);
    }
    catch(...)
    {
        assert( false );
        return std::wstring( str.begin(), str.end() );
    }
}

#ifdef WIN32
static std::wstring Utf8string2wstringForStd(const std::string& str)
{
    // Using Visual Studio, even with /utf-8 option set, trying to call std::fstream::open with UTF-8 string ends up
    // with file creating file "ééé.txt" instead of "ééé.txt"
    return Utf8string2wstring(str);
}
#else
static std::string Utf8string2wstringForStd(const std::string& str)
{
    // Linux, OK to use UTF8 for std::fstream:
    return str;
}
#endif

int main()
{
    std::string dir = std::filesystem::current_path().string();

    std::string fileName = dir + "/ééé.txt";

    std::fstream file;
    file.open(Utf8string2wstringForStd(fileName), std::ios::out );
    if ( file.is_open() )
    {
        file << "Hello" << std::endl;
        file.close();

        std::string copy = fileName + ".copy";

        if ( !DoCopyFile( Utf8string2wstring( fileName ), Utf8string2wstring( copy ) ) )
            std::cout << "COPY FAILED" << std::endl;
        else
            std::cout << "COPY SUCCEEDED" << std::endl;
    }

    return 0;
}

Note: The C++ file is encoded in UTF8.

Is this code doing something wrong? I suspect Utf8string2wstring...under Windows I get compiler warnings saying it uses deprecated functions, but it works fine.

I found topics reporting errors "Invalid or incomplete multibyte or wide character", but more at compile time, not at runtime..


Solution

  • I shortened the reproducer to just fs::exists(fs::path(L"abcé.txt")); and started checking versions in Compiler explorer. GCC 12.3 seems to be the first release where this is no longer an error. I could not find a bug report about it in the GCC release notes, however.

    For Clang it is fixed as of 17.0.1.