I want to parse different kinds of chunks in a file with varying length, so I created a function to read out a chunk by passing in the ifstream, like this:
void parse_next(std::ifstream& input_file, std::vector<uint8_t>& data, size_t count)
{
std::copy_n(
std::istreambuf_iterator<char>(input_file),
count,
std::back_inserter(data)
);
}
I expected the file position to increment count, i.e.,
// some init code
size_t const pos_before{input_file.tellg()};
parse_next(input_file, data, count);
size_t const pos_after{input_file.tellg()};
// this assumption is _not_ correct!
assert(count == (pos_after - pos_before));
// but this is!
assert((count - 1) == (pos_after - pos_before));
However, using the input_file.read()
with count instead of std::copy_n
gives the right count.
So what's going on here? I can't see anywhere in the documentation of istreambuf_iterator where this is mentioned.
Or is it the std::copy_n
that is messing with me?
Note that in the example above, we can assume that there is plenty of data left to read, so it is not because the file is empty. Further, the file is opened as binary.
You're using istreambuf_iterator
. It is an input-only iterator. Imagine that you have a file with 5 bytes and you read count=2
:
sgetc
to read the first byte. This does not advance the stream position.count=2
, copy_n
needs one more byte. So it increments the stream position.sgetc
.count=2
, no more byte are required. copy_n
returns.Note that only step 2 increments the stream position, and it only needs to be called once when reading two characters.
Yes, this is strange. But most people would just use input_file.read()
. I've almost never seen people use istreambuf_iterator
in production code...not least of all because it is inefficient for your type of use case.
We could say hey, let's change copy_n
to increment the iterator before returning. That would fix this 0.1% use case, at the cost of slowing down other use cases.