pythonconcurrency

A safe, atomic file-copy operation


I need to copy a file from one location to another, and I need to throw an exception (or at least somehow recognise) if the file already exists at the destination (no overwriting).

I can check first with os.path.exists() but it's extremely important that the file cannot be created in the small amount of time between checking and copying.

Is there a built-in way of doing this, or is there a way to define an action as atomic?


Solution

  • There is in fact a way to do this, atomically and safely, provided all actors do it the same way. It's an adaptation of the lock-free whack-a-mole algorithm, and not entirely trivial, so feel free to go with "no" as the general answer ;)

    What to do

    1. Check whether the file already exists. Stop if it does.
    2. Generate a unique ID
    3. Copy the source file to the target folder with a temporary name, say, <target>.<UUID>.tmp.
    4. Rename the copy <target>-<UUID>.mole.tmp.
    5. Look for any other files matching the pattern <target>-*.mole.tmp.
    1. Check again to see if the destination file already exists. If so, attempt to delete your temporary file. (Don't worry if it's gone. Remember your UUID may have changed in step 5.)
    2. If you didn't already attempt to delete it in step 6, attempt to rename your temporary file to its final name, <target>. (Don't worry if it's gone, just jump back to step 5.)

    You're done!

    How it works

    Imagine each candidate source file is a mole coming out of its hole. Half-way out, it pauses and whacks any competing moles back into the ground, before checking no other mole has fully emerged. If you run this through in your head, you should see that only one mole will ever make it all the way out. To prevent this system from livelocking, we add a total ordering on which mole can whack which. Bam! A  PhD thesis  lock-free algorithm.

    Step 4 may look unnecessary—why not just use that name in the first place? However, another process may "adopt" your  mole  file in step 5, and make it the winner in step 7, so it's very important that you're not still writing out the contents! Renames on the same file system are atomic, so step 4 is safe.