c++stlstdcrt

How to convert from UTF-8 to ANSI using standard c++


I have some strings read from the database, stored in a char* and in UTF-8 format (you know, "á" is encoded as 0xC3 0xA1). But, in order to write them to a file, I first need to convert them to ANSI (can't make the file in UTF-8 format... it's only read as ANSI), so that my "á" doesn't become "á". Yes, I know some data will be lost (chinese characters, and in general anything not in the ANSI code page) but that's exactly what I need.

But the thing is, I need the code to compile in various platforms, so it has to be standard C++ (i.e. no Winapi, only stdlib, stl, crt or any custom library with available source).

Anyone has any suggestions?


Solution

  • A few days ago, somebody answered that if I had a C++11 compiler, I could try this:

    #include <string>
    #include <codecvt>
    #include <locale>
    
    string utf8_to_string(const char *utf8str, const locale& loc)
    {
        // UTF-8 to wstring
        wstring_convert<codecvt_utf8<wchar_t>> wconv;
        wstring wstr = wconv.from_bytes(utf8str);
        // wstring to string
        vector<char> buf(wstr.size());
        use_facet<ctype<wchar_t>>(loc).narrow(wstr.data(), wstr.data() + wstr.size(), '?', buf.data());
        return string(buf.data(), buf.size());
    }
    
    int main(int argc, char* argv[])
    {
        string ansi;
        char utf8txt[] = {0xc3, 0xa1, 0};
    
        // I guess you want to use Windows-1252 encoding...
        ansi = utf8_to_string(utf8txt, locale(".1252"));
        // Now do something with the string
        return 0;
    }
    

    Don't know what happened to the response, apparently someone deleted it. But, turns out that it is the perfect solution. To whoever posted, thanks a lot, and you deserve the AC and upvote!!