c++stdfstreamcodecvtchar-traits

Why std::basic_fstream<unsigned char> won't work?


When trying to compile this code:

std::fstream file("file.name", std::ios::out | std::ios::binary);
uint8_t buf[BUFSIZE];
//Fill the buffer, etc...
file.write(buf, BUFSIZE);

compiler will give me warning about oh-not-so-healthy conversion from unsigned char to char in call to write(). As std::fstream is in fact just a typedef for std::basic_fstream<char>, one could think that using std::basic_fstream<uint8_t> instead would allow them to compile above code without warning, as write() expects pointer of template type.

This works, of course, but another problem pops out. Even though this code compiles perfectly fine:

std::basic_fstream<uint8_t> file("file.name", std::ios::out | std::ios::binary);
uint8_t buf[BUFSIZE];
//Fill the buffer, etc...
file.write(buf, BUFSIZE);

it will now fail on call to write(), even though previous version was working (disregard compiler warnings). It took me a while to pinpoint where exception is thrown from in standard C++ library code, but I still don't really understand what's the case here. It looks like std::basic_fstream uses a few character coding mechanism, and since there is one defined for char but none for unsigned char, the file stream fails silently when trying to use "wrong" character data type... That's how I see it, at least.

But that's also what I don't understand. There is no need for any character encoding. I don't even open file in text mode, I want to deal with binary data. That's why I use arrays of type uint8_t, not char, it feels more natural to use this data type rather than plain old char. But before I either decide to give up on uint8_t data type and just accept working with char buffers, or start using arrays of custom byte datatype defined as char, I'd like to ask two questions:

  1. What exactly is that mechanism that stops me from using unsigned character datatype? Is it really something related to character encoding, or does it serve some other purpose? Why file stream works fine with signed character data types, but not for unsigned ones?
  2. Assuming that I still would want to use std::basic_fstream<uint8_t>, regardless how (un)reasonable it is - is there any way to achieve that?

Solution

  • std::basic_fstream<unsigned char> doesn't work because it uses std::char_traits<unsigned char> but the standard library doesn't provides such a specialisation, see std::char_traits for full details.

    If you'd like to read/write binary data, you need to use std::basic_fstream<char>, open it with std::ios_base::binary flag and use std::basic_ostream<CharT,Traits>::write function to write binary data.

    That's a bit of legacy since all char types can be used to represent binary data. The standard library uses char probably because that's the shortest one to type and read that does the job.


    What exactly is that mechanism that stops me from using unsigned character datatype?

    No std::char_traits<unsigned char> specialization.

    Is it really something related to character encoding, or does it serve some other purpose?

    std::char_traits has a few purposes exactly defined in its interface but that doesn't include decoding/encoding. The latter is done by codecvt, see the usage example there.

    Why file stream works fine with signed character data types, but not for unsigned ones?

    Because std::basic_ostream<CharT,Traits>::write accepts CharT, the first template parameter you specify for the stream. It writes the same character type it reads and it uses that codecvt to convert from CharT to bytes.

    Assuming that I still would want to use std::basic_fstream<uint8_t>, regardless how (un)reasonable it is - is there any way to achieve that?

    The standard class and function templates cannot be specialized for built-in types, if I am not mistaken. You'd need to create another class with std::char_traits interface and specify that as the second template argument for the standard streams. I guess, you would need a pretty strong (philosophical) reason to roll up your sleeves and do that.

    If you don't, you may like to keep using std::fstream<char> and do stream.write(reinterpret_cast<char const*>(buf), sizeof buf);.