My question is similar to How to avoid hard disc fragmentation?, but I will be generating several hundred files per day that can range in size from 2 MB to 100+ MB (that questioner implied his files were smaller as he was more worried about cluttering his disk, my problem is performance reading these files). These files are written a little bit at a time (logging data), which is the best way to create fragmentation. (A database is not an option.) I have code to defragment after they are completely written, but performance suffers for files being read back the same day.
It appears that the way to do it is suggested by How can I limit file fragmentation while working with .NET?; tho they are short on details (and I'm in C++). I'd use SetFilePointerEx() and SetEndOfFile() to size the file to 2MB to start with, and then when the file reaches allocated size, I'll resize based on observed growth rates. Then when writing is complete, resize to the actual data size.
One pitfall I see (actually http://www.cplusplus.com/forum/windows/22114/ pointed it out) is what happens if my app crashes or the computer shuts down. Now I've got undetermined data in my file, and no way to detect it through Windows. This suggests I create a file to track how much data has been written, either per file or a single file. Is there a better strategy? Perhaps writing enough zeros after each write to be able to detect later (and then backing up to be ready for the next write)?
Do you see any other gotchas I missed?
We use the preallocation method to increase the file size in chunks of 500MB. As it is video data, we also store a separate index file that we can read and validate to find when the last (believed valid) data is.
If your data is textual, this may be a bit more of a pain, but you can just write on the end ignoring null data, maybe jumping forward to the 2MB boundary?