c++unicodeutf-8iostreamfilestreams

UTF-8-compliant IOstreams


Does GCC's standard library or Boost or any other library implement iostream-compliant versions of ifstream or ofstream that supports conversion between UTF-8-encoded (file-) streams and a std::vector<wchar_t> or std::wstring?


Solution

  • Your question doesn't quite work. UTF-8 is a specific encoding, while wchar_t is a data type. Moreover, wchar_t is intended by the standard to represent the system's character set, but this is entirely left to platform, and the standard makes no requirements.

    Therefore, the correct thing to ask for is first of all conversion between the system's narrow, multibyte encoding and the fixed-length encoding of the system's encoding into a wide string. This functionality is provided by std::mbstowcs and std::wcstombs. There may also be a locale facet somewhere that wraps this, but that's a bit of a niche area of the library.

    If you want to convert between the opaque "system's encoding" prescribed by the standard and a definite encoding prescribed by your serialized data source/sink, you need an extra library. I'd recommend Posix's iconv(), which is widely available. (The Windows API has a different approach and offers special functions for conversion.)

    C++11 alleviates the issue slightly by adding an explicit family of UTF-encoded string types and literals, and presumably also transcoding facilities among those (though I've never seen them implemented by anyone).

    Here's my standard response of past posts on the subject: Q1, Q2, Q3. C++11 will be a joy once its fully available :-)