fileperlappendprocessing-efficiency

What is the most efficient file modification method? Adding many lines to the same file, or reading in, processing, and writing out the file?


I have a perl script that must create a file, write a few hundred lines of content to it, then read all the lines it's written and add lines whenever it finds a match based on a separate configuration file. It's not as ideal as just writing all the correct things the first time, but I have a few different use cases that require different lines to be added in different scenarios.

My question is this: would it be better to write the initial file, then read through all the lines in this file and append to it at several different locations? Or should I write the initial file, then read the lines of the initial file, and write them out to a new file adding the new lines as I do so?

I have a rough understanding of how file management happens from the operating systems class I took, and from what I understand I think it could be more expensive to repeatedly have to move the file offset, but I'm not sure if this cost would outweigh the cost of file creation.

I'm aware that the cost difference is likely very trivial for a small text file of only a few hundred lines, but I am more so just curious about what is faster.

Other context that I am not sure is relevant is that the OS is Linux, and that it is a multi user system, although concurrent access to this file should be rare or nonexistent. The file is created by this script, read by a single user afterwards then all but discarded.


Solution

  • If it's only a few hundred lines, keep it in memory. That does away with the concurrency concerns too and your data structure won't hit the disk unless you're short of RAM and start swapping.