c++parsingifstreamseekg

Fast accessing file position in ifs() C++


Info: What is the best way to store a position in a txt file, close the file, and later open it at the same position using c++?

I have a large text file that I need to parse in chunks and feed into some system. As of now, I load the file in the ifstream and then getlines until I find the data I need (let's say data is at position {x}). After this I close the file, process the data, and now I need to continue feeding the data from the big file. So I open the file again, and getlines until I get to position {x+d} this time ( d is the offset from the data I read)...

Instead of going through file once, it is easy to see, that I go (1d + 2d + ... + (N-1)d + Nd) ~ d*N^2 times through the file. Now I want to save the position in the file after d, close the file, and then instantly open the file at the same position. What can be used for this?


Solution

  • You can't do this with newline translation enabled (what the Standard calls "text mode"), because seeking back to the position requires the standard library to scan through the entire front of the file to find N characters-not-double-counting-newlines. Translations of variable length encodings (e.g. between UTF-8 and UCS) cause a similar problem.

    The solution is to turn off newline translation (what the Standard calls "binary mode") and any other translations that involve variable-length encodings, and handle these yourself. With all translations turned off, the "file position" is the number directly used by the OS to perform file I/O, and therefore has the potential to be very efficient (whether it actually is efficient depends on the standard library implementation details).