c++fileionull-character

Why on earth is my file reading function placing null-terminators where excess CR LF carriages should be?


Today I tried to put together a simple OpenGL shader class, one that loads text from a file, does a little bit of parsing to build a pair of vertex and fragment shaders according to some (pretty sweet) custom syntax (for example, writing ".varying [type] [name];" would allow you to define a varying variable in both shaders while only writing it once, same with ".version",) then compiles an OpenGL shader program using the two, then marks the shader class as 'ready' if and only if the shader code compiled correctly.

Now, I did all this, but then encountered the most bizarre (and frankly kinda scary) problems. I set everything up, declared a new 'tt::Shader' with some file containing valid shader code, only to have it tell me that the shader was invalid but then give me an empty string when I asked what the error was (which means OpenGL gave me an empty string as that's where it gets it from.)

I tried again, this time with obviously invalid shader code, and while it identified that the shader was invalid, it still gave me nothing in terms of what the error was, just an empty string (from which I assumed that obviously the error identification portion of it was also just the same as before.)

Confused, I re-wrote both shaders, the valid and invalid one, by hand as a string, compiling the classes again with the string directly, with no file access. Doing this, the error vanished, the first one compiled correctly, and the second one failed but correctly identified what the error was.

Even more confused, I started comparing the strings from the files to those I wrote myself. Turns out the former were a tad longer than the ladder, despite printing the same. After doing a bit of counting, I realised that these characters must be Windows CR LF line ending carriage characters that got cut off in the importing process.

To test this, I took the hand-written strings, inserted carriages where they would be cut off, and ran my string comparison tests again. This time, it evaluated there lengths to be the same, but also told me that the two where still not equal, which was quite puzzling.

So, I wrote a simple for-loop to iterate through the characters of the two strings and print then each next to one another, and cast to integers so I could see their index values. I ran the program, looked through the (quite lengthy) list, and came to a vary insightful though even less clarifying answer: The hidden characters were in the right places, but they weren't carriages ... they were null-terminators!

Here's the code for the file reading function I'm using. It's nothing fancy, just standard library stuff.

// Attempts to read the file with the given path, returning a string of its contents.
// If the file could not be found and read, an empty string will be returned.
// File strings are build by reading the file line-by-line and assembling a single with new lines placed between them.
// Given this line-by-line method, take note that it will copy no more than 4096 bytes from a single line before moving on.
inline std::string fileRead(const std::string& path) {


    if (!tt::fileExists(path))
        return "";
    std::ifstream a;
    a.open(path);
    std::string r;
    const tt::uint32 _LIMIT = 4096;
    char r0[_LIMIT];

    tt::uint32 i = 0;
    while (a.good()) {
        a.getline(r0, _LIMIT);
        if (i > 0)
            r += "\n";
        i++;
        r += std::string(r0, static_cast<tt::uint32>(a.gcount()));
    }

    // TODO: Ask StackOverflow why on earth our file reading function is placing null characters where excess carriages should go.

    for (tt::uint32 i = 0; i < r.length(); i++)
        if (r[i] == '\0')
            r[i] = '\r';
    a.close();
    tt::printL("Reading file '" + path + "' ...");
    return r;
}

If y'all could take a read and tell me what the hell is going on with it, that'd be awesome, as I'm at a total loss for what its doing to my string to cause this.

Lastly, I do get why the null-terminators didn't show up to me but did for OpenGL, the ladder was using C-strings, while I was just doing everything with std::string objects, where store things based on length given that they're pretty much just fancy std::vector objects.


Solution

  • Read the documentation for std::string constructor. Constructor std::string(const char*, size_t n) creates string of size n regardless of input. It may contain null character inside or even more than 1. Note that size of std::string doesn't include the null character (so that str[str.size()] == '\0' always).

    So clearly the code simply copies the null character from the output buffer of the getline function.

    Why would it do that? Go to gcount() function documentation - it returns the number of extracted characters by the last operation. I.e., it includes the extracted character \n which is replaced in output by \0 voila. Exactly one number more that the constructor ask for.

    So to fix it simply do replace:

    r += std::string(r0, static_cast<tt::uint32>(a.gcount()-1));

    Or you could've simply used getline() with std::string as input instead of a buffer - and none of this would've happened.