c++file-iovectorbinaryfiles

How to read a binary file into a vector of unsigned chars


Lately I've been asked to write a function that reads the binary file into the std::vector<BYTE> where BYTE is an unsigned char. Quite quickly I came with something like this:

#include <fstream>
#include <vector>
typedef unsigned char BYTE;

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::streampos fileSize;
    std::ifstream file(filename, std::ios::binary);

    // get its size:
    file.seekg(0, std::ios::end);
    fileSize = file.tellg();
    file.seekg(0, std::ios::beg);

    // read the data:
    std::vector<BYTE> fileData(fileSize);
    file.read((char*) &fileData[0], fileSize);
    return fileData;
}

which seems to be unnecessarily complicated and the explicit cast to char* that I was forced to use while calling file.read doesn't make me feel any better about it.


Another option is to use std::istreambuf_iterator:

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::ifstream file(filename, std::ios::binary);

    // read the data:
    return std::vector<BYTE>((std::istreambuf_iterator<char>(file)),
                              std::istreambuf_iterator<char>());
}

which is pretty simple and short, but still I have to use the std::istreambuf_iterator<char> even when I'm reading into std::vector<unsigned char>.


The last option that seems to be perfectly straightforward is to use std::basic_ifstream<BYTE>, which kinda expresses it explicitly that "I want an input file stream and I want to use it to read BYTEs":

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::basic_ifstream<BYTE> file(filename, std::ios::binary);

    // read the data:
    return std::vector<BYTE>((std::istreambuf_iterator<BYTE>(file)),
                              std::istreambuf_iterator<BYTE>());
}

but I'm not sure whether basic_ifstream is an appropriate choice in this case.

What is the best way of reading a binary file into the vector? I'd also like to know what's happening "behind the scene" and what are the possible problems I might encounter (apart from stream not being opened properly which might be avoided by simple is_open check).

Is there any good reason why one would prefer to use std::istreambuf_iterator here?
(the only advantage that I can see is simplicity)


Solution

  • When testing for performance, I would include a test case for:

    std::vector<BYTE> readFile(const char* filename)
    {
        // open the file:
        std::ifstream file(filename, std::ios::binary);
    
        // Stop eating new lines in binary mode!!!
        file.unsetf(std::ios::skipws);
    
        // get its size:
        std::streampos fileSize;
    
        file.seekg(0, std::ios::end);
        fileSize = file.tellg();
        file.seekg(0, std::ios::beg);
    
        // reserve capacity
        std::vector<BYTE> vec;
        vec.reserve(fileSize);
    
        // read the data:
        vec.insert(vec.begin(),
                   std::istream_iterator<BYTE>(file),
                   std::istream_iterator<BYTE>());
    
        return vec;
    }
    

    My thinking is that the constructor of Method 1 touches the elements in the vector, and then the read touches each element again.

    Method 2 and Method 3 look most promising, but could suffer one or more resize's. Hence the reason to reserve before reading or inserting.

    I would also test with std::copy:

    ...
    std::vector<byte> vec;
    vec.reserve(fileSize);
    
    std::copy(std::istream_iterator<BYTE>(file),
              std::istream_iterator<BYTE>(),
              std::back_inserter(vec));
    

    In the end, I think the best solution will avoid operator >> from istream_iterator (and all the overhead and goodness from operator >> trying to interpret binary data). But I don't know what to use that allows you to directly copy the data into the vector.

    Finally, my testing with binary data is showing ios::binary is not being honored. Hence the reason for noskipws from <iomanip>.