I try to monitor the contents of a directory tree, which my contain a huge ammount of files (many directories with 9000 files per directory as an example).
Synchron mode:
I first tryied using ReadDirectoryChangesW in blocking mode (synchronous), but when I delete the watched directory I end up in a deadlock which I can't detect nor exit.
#
# Monitors a directory for changes and pass the changes to the queue
#
def MonitorDirectory(self, out_queue):
print("Monitoring instance \'{0}\' is watching directory: {1}".format(self.name, self.path))
# File monitor
FILE_LIST_DIRECTORY = 0x0001
buffer = win32file.AllocateReadBuffer(1024 * 64)
hDir = win32file.CreateFile(self.path,
FILE_LIST_DIRECTORY,
win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE | win32con.FILE_SHARE_DELETE,
None,
win32con.OPEN_EXISTING,
win32con.FILE_FLAG_BACKUP_SEMANTICS,
None)
# Monitor directory for changes
while not self._shutdown.is_set():
# Create handle to directory if missing
#if os.path.isdir(self.path):
self.fh.write("ReOpen Exists {0}\n".format(os.path.isdir(self.path)))
self.fh.flush()
try:
hDir = win32file.CreateFile(self.path,
FILE_LIST_DIRECTORY,
win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE | win32con.FILE_SHARE_DELETE,
None,
win32con.OPEN_EXISTING,
win32con.FILE_FLAG_BACKUP_SEMANTICS,
None)
except:
self.fh.write("Handle is dead\n")
self.fh.flush()
try:
self.fh.write("{0}\n".format(newH))
self.fh.flush()
except:
self.fh.write("Write failed\n")
self.fh.flush()
self.fh.write("Check Changes\n")
self.fh.flush()
results = win32file.ReadDirectoryChangesW(hDir,
1024 * 64,
True,
win32con.FILE_NOTIFY_CHANGE_FILE_NAME |
win32con.FILE_NOTIFY_CHANGE_DIR_NAME |
win32con.FILE_NOTIFY_CHANGE_ATTRIBUTES |
win32con.FILE_NOTIFY_CHANGE_SIZE |
win32con.FILE_NOTIFY_CHANGE_LAST_WRITE |
win32con.FILE_NOTIFY_CHANGE_SECURITY,
None,
None)
# Add all changes to queue
for action, file in results:
self.fh.write("Action: {0} on {1}\n".format(action, file))
out_queue.put((action, time.time(), os.path.join(self.path, file)))
self.fh.flush()
#else:
# Done main loop
print("Monitoring instance \'{0}\' has finished watching directory: {1}".format(self.name, self.path))
there just seemed to be no way to avoid the call from blocking when the watched directory is removed?
Also as the function is running in a thread, I cannot kill it when deadlocked, from a "supervisor" thread which would monitor the parent directory for DELETE actions on the watched directory and I dont really like that being a good solution as it involves much more code.
ASynchron mode:
I then tried the overlapped mode (async) which does not block in a deadlock, but I can't detect when the directory handle becomes void as the diorectory is deleted. The WaitForSingleObject call just time's out, and checking if the directory is present with os.path.isdir does not help because if the directory is recreated in the mean time, it will not return False, but the old directory handle is still invalid and will not detect the changes in the newly created directory with the same name.
Afer days of trying various approaches, I finnaly got to this code, which however does not work flawlessly bacause it still does not detect the removel of the watched directory and it also does miss a few files when mass deleting files rapidly. A thing which the sync mode did not.
#
# Monitors a directory for changes and pass the changes to the queue
#
def MonitorDirectory(self, out_queue):
print("Monitoring instance \'{0}\' is watching directory: {1}".format(self.name, self.path))
# File monitor
FILE_LIST_DIRECTORY = 0x0001
overlapped = pywintypes.OVERLAPPED()
overlapped.hEvent = win32event.CreateEvent(None, False, 0, None)
buffer = win32file.AllocateReadBuffer(1024 * 64)
# Main loop to keep watching active
while not self._shutdown.is_set():
# Open directory
try:
hDir = win32file.CreateFile(self.path,
FILE_LIST_DIRECTORY,
win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE | win32con.FILE_SHARE_DELETE,
None,
win32con.OPEN_EXISTING,
win32con.FILE_FLAG_BACKUP_SEMANTICS | win32con.FILE_FLAG_OVERLAPPED,
None)
except:
# Wait before retry
time.sleep(1)
else:
# Monitor directory for changes
while not self._shutdown.is_set():
win32file.ReadDirectoryChangesW(hDir,
buffer,
True,
win32con.FILE_NOTIFY_CHANGE_FILE_NAME |
win32con.FILE_NOTIFY_CHANGE_DIR_NAME |
win32con.FILE_NOTIFY_CHANGE_ATTRIBUTES |
win32con.FILE_NOTIFY_CHANGE_SIZE |
win32con.FILE_NOTIFY_CHANGE_LAST_WRITE |
win32con.FILE_NOTIFY_CHANGE_SECURITY,
overlapped,
None)
# Wait for the changes
rc = win32event.WaitForSingleObject(overlapped.hEvent, 10000)
if rc == win32event.WAIT_OBJECT_0:
try:
bytes_returned = win32file.GetOverlappedResult(hDir, overlapped, True)
except:
raise Exception("Error: handle invalid?")
else:
# Get the changes
for action, file in win32file.FILE_NOTIFY_INFORMATION(buffer, bytes_returned):
out_queue.put((action, time.time(), os.path.join(self.path, file)))
elif rc == win32event.WAIT_TIMEOUT:
print("Monitoring instance \'{0}\': Timeout, no actions")
else:
raise Exception("Error?! RC = {0}".format(rc))
# Done main loop
print("Monitoring instance \'{0}\' has finished watching directory: {1}".format(self.name, self.path))
Is there a way to handle the detection of the removal of the watched directory, instead of just removing the win32con.FILE_SHARE_DELETE flag?
Now, a few words on FILE_SHARE_DELETE (could find some doc about it on [MS.Docs]: CreateFileW function):
The golden rule (or immutable law, if you will) is that user can't really delete a file (or dir) that has open handles.
Attempting to delete or rename (this seems irrelevant for the current problem, but it isn't) a dir with open handles might have different results (depending on the way that the handles were created and the API used to rename / delete the dir):
I tested the above scenarios trying to delete / rename the dir in various ways:
rmdir /q /s
, move /y
I started to investigating ways to fix your problem and I came across [MS.Docs]: GetFinalPathNameByHandleW function (win32file.GetFinalPathNameByHandle). Played with it:
>>> import sys >>> import os >>> import win32api >>> import win32file >>> import win32con >>> >>> print("Python {:s} on {:s}\n".format(sys.version, sys.platform)) Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32 >>> os.listdir() ['code00.py', 'test'] >>> test_dir = ".\\test" >>> os.path.abspath(test_dir) 'e:\\Work\\Dev\\StackOverflow\\q049652110\\test' >>> h = win32file.CreateFile(test_dir, win32con.GENERIC_READ, win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE | win32con.FILE_SHARE_DELETE, None, win32con.OPEN_EXISTING, win32con.FILE_FLAG_BACKUP_SEMANTICS, None) >>> h <PyHANDLE:620> >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED) '\\\\?\\E:\\Work\\Dev\\StackOverflow\\q049652110\\test' >>> test_dir1 = test_dir + "1" >>> os.rename(test_dir, test_dir1) >>> os.listdir() ['code00.py', 'test1'] >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED) '\\\\?\\E:\\Work\\Dev\\StackOverflow\\q049652110\\test1' >>> os.rename(test_dir1, test_dir) >>> os.listdir() ['code00.py', 'test'] >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED) '\\\\?\\E:\\Work\\Dev\\StackOverflow\\q049652110\\test' >>> os.unlink(test_dir) Traceback (most recent call last): File "<stdin>", line 1, in <module> PermissionError: [WinError 5] Access is denied: '.\\test' >>> # Delete the dir from elsewhere (don't use os.rmdir since that will only schedule the dir for deletion) ... >>> os.listdir() ['code00.py'] >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED) '\\\\?\\E:\\$RECYCLE.BIN\\S-1-5-21-1906798797-2830956273-3148971768-1002\\$RY7SH8D' >>> os.mkdir(test_dir) >>> os.listdir() ['code00.py', 'test'] >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED) '\\\\?\\E:\\$RECYCLE.BIN\\S-1-5-21-1906798797-2830956273-3148971768-1002\\$RY7SH8D' >>> os.rmdir(test_dir) # Since the new "test" dir wasn't open, operation successful >>> os.listdir() ['code00.py'] >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED) '\\\\?\\E:\\$RECYCLE.BIN\\S-1-5-21-1906798797-2830956273-3148971768-1002\\$RY7SH8D' >>> # Restore the dir from RECYCLE.BIN ... >>> os.listdir() ['code00.py', 'test'] >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED) '\\\\?\\E:\\Work\\Dev\\StackOverflow\\q049652110\\test' >>> os.rmdir(test_dir) # Still an open handle, scheduled to be deleted >>> os.listdir() ['code00.py', 'test'] >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED) '\\\\?\\E:\\Work\\Dev\\StackOverflow\\q049652110\\test' >>> win32api.CloseHandle(h) >>> os.listdir() ['code00.py'] # After closing the handle the dir was deleted >>> h <PyHANDLE:0> >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED) Traceback (most recent call last): File "<stdin>", line 1, in <module> pywintypes.error: (6, 'GetFinalPathNameByHandle', 'The handle is invalid.')
Note: I also tried [MS.Docs]: GetFileInformationByHandle function (win32file.GetFileInformationByHandle), but I couldn't reproduce the behavior, not even with one of the 3 pywintypes.datetime fields (which should be the Last access / modify time); when renaming / deleting the dir, none of the info changed. I didn't spend time to investigate, I thought of 2 possible reasons:
That data is somehow stored "inside" the HANDLE, and the function doesn't actually query the FS when invoked (as opposed to GetFinalPathNameByHandle)
When the dir is renamed / deleted, those date fields change for its parent dir(s)
So, we seem to have a winner. I'm only going to post the algorithm (the code should be fairly simple):
Other possible approaches (although undesirable):
Retrieves information that describes the changes within the specified directory. The function does not report changes to the specified directory itself.
Regarding "event loss", as I specified in the other answer, there's no way to be sure that all of them will be processed, there are only ways to minimize the lost ones number.