I am dealing with code that:
enumerates source and destination directories and generates {src
, dst
} pairs
... each pair is sent to a pool of worker threads
... which performs the work, for example "copy src
to dst
"
(all this is simplified quite a bit).
Problem:
When file gets created it also gets a shortname which can be the same as another file in source directory (name collision) which leads to variety of effects (depending on the order of operations). For example, copying two files my file
and MYFILE~1
can produce 2 or 1 files in destination (depending on your luck), probably with corrupted content (in latter case).
Question:
How to avoid problems that arise from such collisions? Would be nice to have a function that creates/opens a file ignoring shortnames...
Notes:
can't assume anything about the way shortname is generated. Various systems employ different schemes (see this)
even if you run these jobs in sequential manner (one-by-one) -- they need to be executed in order which depends on shortname generation logic (which is unknown). Plus this implies loading and sorting/etc entire directory in memory before running any jobs
both source and destination can be very big (potentially millions files), (if possible) I'd like to avoid loading entire directory into memory or enumerating it multiple times
can't switch off shortname generation in destination volume and making it a requirement is not an option (plus, switching it off doesn't remove existing shortnames anyway)
application is limited only to Win32 API and NT API
Edit: it occurred to me that in general case you can't do it even if everything happens on one thread -- simply because regardless of order you choose, there will be a shortname generation scheme and a set of filenames that is guaranteed to produce a collision during processing.
If this is correct -- how system utilities copy files? Do they assume something about shortnames or perform "validate and fix discrepancies" after copy is complete?
Problem basically boils down to following: dst
should never be opened via shortname
.
For example (in case of NT API
) this can be achieved like this:
when opening (NtCreateFile()
, no truncate) dst
use FILE_OPEN_IF
as CreateDisposition
on success check IoStatusBlock::Information
if file was created (FILE_CREATED
)
if yes -- nothing needs to be done (it is impossible to create a file via shortname)
if no -- check current file name via NtQueryInformationFile(FileNameInformation)
NtCreateFile()
-- errorIt is possible for parallel process to rename file right after you opened it, but I'd treat it as an error.
Error can lead to couple of retries followed by hard failure (that needs human attention). Shouldn't be too hard to fix the issue manually (or even programmatically) since name of conflicting file is returned by NtQueryInformationFile()
.
P.S. Additional steps can be taken to prevent/reduce collisions. For example, if shortname generation logic is known -- dst
objects can be processed in certain order.