performancefilefileupdate

It is OK to overwrite a file on every update?


When I update little binary or text files, I commonly overwrite the whole file with the new content even when just a small fraction of the file was changed. I do so because it is easier to overwrite the file contents than keeping track of the positions of every little piece of data inside the file.

I think this is not a problem when the file size and update frequency are not that big, but I should pick another file update technique when dealing with big files and short update periods.

I'd like to know when should I start worrying about the way I'm updating my files. What criteria should I use to decide between this whole file overwriting update and a more elaborate and efficient technique?


Solution

  • The main criteria when deciding which technique to use for saving files is the cost associated with it.

    Are your changes done locally or must be transferred over network? For local setups, time required to save the file to disk is the main cost, and it's the main criteria to benchmark.

    For remote changes, you should also consider the time required for the transfer and the bandwidth consumption (Delta changes can be used, but they come with computational cost for both the sender and receiver, and in some cases can be bigger than the original file)

    There is no magical recipe for this problem. The best approach is to benchmark different solutions with realistic test scenarios.

    One more thing: Do you have control over the file that are edited? If they are used to store the internal model of an application, and they are changed often, a better approach would be to change the data model (use a database / split responsibility between multiple files).