In c++, how expensive is it to use the istream::seekg operation?
EDIT: How much can I get away with seeking around a file and reading bytes? What about frequency versus magnitude of offset?
I have a large file (4GB) that I am parsing, and I want to know if it's necessary to try to consolidate some of my seekg calls. I would assume that the magnitude of differences in file location play a role--like if you seek more than a page in memory away, it will impact performance--but small seeking is of no consequence. Is this correct?
This question is heavily dependent on your operating system and disk subsystem.
Obviously, the seek itself will take essentially zero time, since it just updates an offset. Actually reading will pull some data off of disk...
...but how much data depends on many things. Your disk has a cache which may have its own block size and may do some sort of read-ahead. Your RAID controller (if any) will have its own cache, possibly with its own block size and read-ahead.
Your kernel has a page cache -- all of free RAM, essentially -- and it will also probably do some sort of read-ahead. On Linux this is configurable, and the kernel will adapt it based on how sequential your access patterns appear to be, whether you have called posix_fadvise
, etc.
All of these caches mean if you access some data, then access nearby data later, there is a chance the second access will not actually touch the disk at all.
If you have the option of coding so that you access the file sequentially, that is certainly going to be faster than random reads, especially small random reads. Seeking on a single mechanical disk takes something like 10ms, so you can do the math here. (Although seeking on a solid state drive is around 100 times faster.)
Large reads are generally better than small reads... Although processing data a few kilobytes at a time can be faster than larger blocks if it allows the processing to stay in cache.
In short, you will need to provide a lot more details about your system and your application to get a proper answer, and even then the most likely answer is "benchmark it".