c++loggingmemorybinaryfstream

How can I simultaneously read and write to a binary file?


How can I simultaneously read and write to binary files? I cannot seem to figure out why my program is erroring out and every bit of debugging I can think of does not yield me any good results. The program takes a string and generates the file information to be written to a binary file. If the log binary exists, then the program compares the information from the file to the information in the log. If the information is different, then the program updates the log. The function I made is below:

#define SHADER_LOG "bin\\shader_log.bin"

void checkLog(const std::string& filename) {
    std::string _path = std::filesystem::absolute(".\\shaders\\" + filename).string();
    std::filesystem::file_time_type last_update = std::filesystem::last_write_time(_path);

    std::pair<std::string, std::filesystem::file_time_type> fileInfo{filename, last_update};

    std::cout << "File Info: { " << fileInfo.first << ", " << fileInfo.second << " }\n";

    std::pair<std::string, std::filesystem::file_time_type> log_entry;
    memset(&log_entry, 0, sizeof(log_entry));

    if (!std::filesystem::exists(SHADER_LOG)) {
        std::ofstream log(SHADER_LOG, std::ios_base::out | std::ios::binary);
        log.write(reinterpret_cast<const char*>(&fileInfo), sizeof(fileInfo));
        log.close();
        return;
    }

    std::fstream log(SHADER_LOG, std::ios_base::in | std::ios_base::out | std::ios::binary);
    if (!log.is_open()) {
        throw std::runtime_error(std::format("Failed to open {}!", SHADER_LOG));
    }

    int i = 0;
    while (!log.eof()) {
        log.read(reinterpret_cast<char*>(&log_entry), sizeof(log_entry));
        //log.getline(reinterpret_cast<char*>(&log_entry), sizeof(log_entry));
        i++;
        if ((log_entry.first == fileInfo.first) && (log_entry.second != fileInfo.second))
        {// Update log entry
            log.write(reinterpret_cast<const char*>(&fileInfo), sizeof(fileInfo));
            std::cout << "Log Entry: { " << log_entry.first << ", " << log_entry.second << " }\n";
            std::cout << "Log successfully updated!\n";
        }
        else
        {// Print to manually confirm update status
            std::cout << "Log Entry: { " << log_entry.first << ", " << log_entry.second << " }\n";
            std::cout << std::format("{} is up to date.\n", log_entry.first);
            std::cout << i << std::endl; // debugging number of times the loop is run
        }
    }

    log.close();
    if (!log.is_open()) {
        std::cout << std::format("{} file closed successfully!\n", SHADER_LOG);
    }
        
    return; // trying to debug exiting the function
}

On first run, the function seems to work as expected. However, on the second run, the program exits unexpectedly and gives the error code:

(process 14128) exited with code -1073741819.

After updating the file to be logged, the function exits without the above code, but the informtion printed to the console reveals more information about the problem:

Log Entry: { , 1601-01-01 00:00:00.0000000 }
 is up to date.

The above is printed some 86 times before the function exits without returning any errors.

I tried condensing the function down to the function below to try and reveal something more about the problem:

void checkLog(const std::string& filename) {
    std::string _path = std::filesystem::absolute(".\\shaders\\" + filename).string();
    std::filesystem::file_time_type last_update = std::filesystem::last_write_time(_path);

    std::pair<std::string, std::filesystem::file_time_type> fileInfo{filename, last_update};

    std::cout << "File Info: { " << fileInfo.first << ", " << fileInfo.second << " }\n";

    std::pair<std::string, std::filesystem::file_time_type> log_entry;
    memset(&log_entry, 0, sizeof(log_entry));

    if (!std::filesystem::exists(SHADER_LOG)) {
        std::fstream log(SHADER_LOG, std::ios_base::out | std::ios_base::in | std::ios::binary);
        log.write(reinterpret_cast<const char*>(&fileInfo), sizeof(fileInfo));
        log.seekp(0); // tried .seekp and .seekg
        /* Debugging */
        std::pair<std::string, std::filesystem::file_time_type> log_entry {};
        memset(&log_entry, 0, sizeof(log_entry)); // Read in another stack thread to clear memory to avoid alignment errors
        log.getline(reinterpret_cast<char*>(&log_entry), sizeof(log_entry));
        std::cout << "Log Entry: { " << log_entry.first << ", " << log_entry.second << " }\n";
        log.close();
        return;
    }
}

This did give me some more information about what could be causing the problem, but I am still at a loss for the solution. The output of the condensed version indicates that my program is unable to write to the log file:

File Info: { shader.comp, 2023-08-15 00:43:48.7875194 }
Log Entry: { , 1601-01-01 00:00:00.0000000 }
 is up to date.

Please help me. I cannot figure out what I am doing wrong and I'm starting to think that it's not possible to read and write to a binary file simultaneously.

Edit:

int main() {
    try {
        checkLog("shader.comp");
        std::cout << "Exited Compiler successfully!\n";
        /* When it gives the error code, it does not print the above line to the console */
    }
    catch (const std::exception& e) {
        std::cerr << e.what() << std::endl;
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

Solution

  • There are multiple, fundamental, problems in the shown code. The first one already results in undefined behavior, that pretty much voids the warranty for the rest of the code that follows:

    memset(&log_entry, 0, sizeof(log_entry));
    

    log_entry is a std::pair that contains a std::string.

    memset can only be used on POD objects. Using it in this kind of a brute-force manner, to nuke an object that's a C++ class, like std::string, is undefined behavior.

    This kind of memset usage is typically seen in C code, to clear out uninitialized memory that holds a struct. However this is C++, and not C, and, fortunately, in C++ we don't need to resort to this kind of primitive initialization. This is what constructors and destructors are for, and in C++ we don't do that (unless we really want to do this with POD objects). However, technically this is now undefined behavior, and the observed results from the rest of the code is null and void. But there are some other obvious defects that will also need to be addressed.

    while (!log.eof()) {
    

    This is a very common bug that's fully explained by the linked answer.

    log.read(reinterpret_cast<char*>(&log_entry), sizeof(log_entry));
    

    Our friend log_entry, with a std::string is back.

    sizeof() is a compile time constant. It is calculated at compile time. It can't possibly know whether that std::string is empty, or it contains the complete contents of the Harry Potter books. The expectation here is that this saves the entire log_entry to a file. Clearly, the amount of data that gets written here will vary, depending on the size of the std::string.

    The shown code appears to assume that the size of each log record is fixed, but if you stop and think, for a moment, then it should become very obvious that this assumption is clearly flawed. The size of each record varies. This, overall, read()/write() approach is also fundamentally flawed.

        if ((log_entry.first == fileInfo.first) && (log_entry.second != fileInfo.second))
            {// Update log entry
                log.write(reinterpret_cast<const char*>(&fileInfo), sizeof(fileInfo));
    

    Ignoring the sizeof issue and the fact that only POD objects (once more) can be read or written to, in this manner: the apparent intent of the above code is that it replaces the matched entry in the log file with the updated contents.

    Unfortunately the existing entry in the log file has been read already. The read/write position, in the file, is now the record that comes after it. And this will, instead, clobber the next record.

    However this is a moot point anyway, since this read() and write() is fundamentally flawed on its own merits. This kind of a read()/write() pattern is also typical of C code, like memset().

    In conclusion, multiple issues must be resolved:

    1. Remove and replace all C code that pretends to be C++. Good-bye memset(). Reimplement sizeof()-based read() and write() with proper serialization and deserialization. A slightly closer look at the shown code suggests that simple formatted output based operation, using >>, should be sufficient. It's already being used for logging to std::cout. The same code can be used to format each log entry. Write the filename and the modification time, on their own lines, to the log file, and use std::getline to read each one, on the rebound.

    2. Fix the record update bug that clobbers the wrong record. Track the file position in the log file, and seek back to the right position before replacing the existing record in the log file.

    3. Fix the common logical flaw with loop-based eof() checking.

    Additionally: it appears that the only reason to go through the trouble of creating all those std::pair objects is to make the whole thing fit within the boundaries of the C-style patterns for reading and writing records to a file. Since the C-style pattern is going down the drain anyway, there's no reason to have all these std::pair objects creating unneeded complexities. They can be removed as well, and the remaining code simplified.