c++iofstream

Text still exists after the position of ifstream::gcount()


I wrote a text file, then read the file to a string buffer larger than the text file.

I thought there would be no text after the position of ifstream::gcount() because the buffer was initialized with \0s.

But there was text. How is this possible?

example code:

#include <iostream>
#include <string>
#include <fstream>

int main() {
    std::string path = "test.txt";

    // write to file
    std::ofstream out(path);    
    for (int i = 1; i <= 10'000; ++i) {
        std::string lineNum = std::to_string(i);
        out << lineNum + "xxxxxxxxxxxxxxx" + lineNum + "\n"; 
    } 
    out.close(); // (Just FYI: without close(), the result changes, although ofstream and ifstream are being used separately and they support RAII.)

    // read from file
    std::ifstream in(path); 
    std::string buffer;
    int bufferSize = 1'000'000; 
    buffer.resize(bufferSize); 
    in.read(buffer.data(), buffer.size()); 

    auto gc = in.gcount(); 
    auto found = buffer.find('\n', gc); 
    std::string substr = buffer.substr(gc - 10, 100); 

    std::cout << "gcount: " << gc << '\n'; 
    std::cout << "found: " << found << '\n'; 
    std::cout << "npos?: " << std::boolalpha << (found == std::string::npos) << '\n'; 
    std::cout << "substr:\n" << substr << std::endl;    
}

result:

gcount: 237788
found: 237810
npos?: false    // I thought `found` should be the same as `string::npos`.
substr:         
xxxx10000
01xxxxxxxxxxxxxxx9601     // I thought there should be no text after `gcount()`.
9602xxxxxxxxxxxxxxx9602
9603xxxxxxxxxxxxxxx9603
9604xxxxxxxxxxxxx

Executed with MSVC for 32bit, on Windows(x64).

P.S. Also tried building for 64bit, but the same result.
(used in.read(const_cast<char*>(buffer.data()), buffer.size()); instead of in.read(buffer.data(), buffer.size());)


Solution

  • @john's comment saved me.

    The culprit was how the different OSs interpret the new line character differently.

    If opened with a hex editor, we can see the difference.

    If run the code in the question in Windows, the file written by the code shows 0x0D 0x0A, which is simply \r\n. But, in Unix or Unix-like OS such as Linux, it would be just 0x0A, which means \n.

    But, if we use the std::ios_base::binary option when using std::fstream, the OS will not interpret the newline character, but just use it "as-is". So, with that option, a hex editor would show only 0x0A regardless of OS.

    So using the ios_base::binary option with ofstream or ifstream or both get rid of the problem described in the question.