In my code, I have a situation where I need to copy data from one file to another. The solution I came up with looks like this:
const int BUF_SIZE = 1024;
char buf[BUF_SIZE];
int left_to_copy = toCopy;
while(left_to_copy > BUF_SIZE)
{
fread(buf, BUF_SIZE, 1, fin);
fwrite(buf, BUF_SIZE, 1, fout);
left_to_copy -= BUF_SIZE;
}
fread(buf, left_to_copy, 1, fin);
fwrite(buf, left_to_copy, 1, fout);
My main thought was that there might be something like memcpy, but for data in files. I just give it two file streams and the total number of bytes. I searched a bit but I couldn't find any such thing.
But if something like that isn't available, what buffer size should I use to make the transfer fastest? Bigger would mean fewer system calls, but I figured it could mess up other buffering or caching on the system. Should I dynamically allocate the buffer so it only takes on pair of read/write calls? Typical transfer sizes in this particular case are from a few KB to a dozen or so MB.
EDIT: For OS specific information, we're using Linux.
EDIT2:
I tried using sendfile, but it didn't work. It seemed to write the right amount of data, but it was garbage.
I replaced my example above with something that looks like this:
fflush(fin);
fflush(fout);
off_t offset = ftello64(fin);
sendfile(fileno(fout), fileno(fin), &offset, toCopy);
fseeko64(fin, offset, SEEK_SET);
I added the flush, offest, and seeking one at a time since it didn't appear to be working.
One thing you could do is increase the size of your buffer. That could help if you have large files.
Another thing is to call directly to the OS, whatever that may be in your case. There is some overhead in fread()
and fwrite().
If you could use unbuffered routines and provider your own larger buffer, you may see some noticeable performance improvements.
I'd recommend getting the number of bytes written from the return value from fread()
to track when you're done.