I wrote a text file, then read the file to a string buffer larger than the text file.
I thought there would be no text after the position of ifstream::gcount()
because the buffer was initialized with \0
s.
But there was text. How is this possible?
example code:
#include <iostream>
#include <string>
#include <fstream>
int main() {
std::string path = "test.txt";
// write to file
std::ofstream out(path);
for (int i = 1; i <= 10'000; ++i) {
std::string lineNum = std::to_string(i);
out << lineNum + "xxxxxxxxxxxxxxx" + lineNum + "\n";
}
out.close(); // (Just FYI: without close(), the result changes, although ofstream and ifstream are being used separately and they support RAII.)
// read from file
std::ifstream in(path);
std::string buffer;
int bufferSize = 1'000'000;
buffer.resize(bufferSize);
in.read(buffer.data(), buffer.size());
auto gc = in.gcount();
auto found = buffer.find('\n', gc);
std::string substr = buffer.substr(gc - 10, 100);
std::cout << "gcount: " << gc << '\n';
std::cout << "found: " << found << '\n';
std::cout << "npos?: " << std::boolalpha << (found == std::string::npos) << '\n';
std::cout << "substr:\n" << substr << std::endl;
}
result:
gcount: 237788
found: 237810
npos?: false // I thought `found` should be the same as `string::npos`.
substr:
xxxx10000
01xxxxxxxxxxxxxxx9601 // I thought there should be no text after `gcount()`.
9602xxxxxxxxxxxxxxx9602
9603xxxxxxxxxxxxxxx9603
9604xxxxxxxxxxxxx
Executed with MSVC for 32bit, on Windows(x64).
P.S. Also tried building for 64bit, but the same result.
(used in.read(const_cast<char*>(buffer.data()), buffer.size());
instead of in.read(buffer.data(), buffer.size());
)
@john's comment saved me.
The culprit was how the different OSs interpret the new line character differently.
If opened with a hex editor, we can see the difference.
If run the code in the question in Windows, the file written by the code shows 0x0D 0x0A
, which is simply \r\n
. But, in Unix or Unix-like OS such as Linux, it would be just 0x0A
, which means \n
.
But, if we use the std::ios_base::binary
option when using std::fstream
, the OS will not interpret the newline character, but just use it "as-is".
So, with that option, a hex editor would show only 0x0A
regardless of OS.
So using the ios_base::binary
option with ofstream
or ifstream
or both get rid of the problem described in the question.