pythonmultithreadingpython-3.xpywin32readdirectorychangesw

ReadDirectoryChangesW unable to detect and handle deletion of watched directory


I try to monitor the contents of a directory tree, which my contain a huge ammount of files (many directories with 9000 files per directory as an example).

Synchron mode:

I first tryied using ReadDirectoryChangesW in blocking mode (synchronous), but when I delete the watched directory I end up in a deadlock which I can't detect nor exit.

#
# Monitors a directory for changes and pass the changes to the queue
#
def MonitorDirectory(self, out_queue):

    print("Monitoring instance \'{0}\' is watching directory: {1}".format(self.name, self.path))

    # File monitor
    FILE_LIST_DIRECTORY = 0x0001

    buffer = win32file.AllocateReadBuffer(1024 * 64)

    hDir = win32file.CreateFile(self.path,
                                FILE_LIST_DIRECTORY,
                                win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE | win32con.FILE_SHARE_DELETE,
                                None,
                                win32con.OPEN_EXISTING,
                                win32con.FILE_FLAG_BACKUP_SEMANTICS,
                                None)

    # Monitor directory for changes
    while not self._shutdown.is_set():

        # Create handle to directory if missing
        #if os.path.isdir(self.path):

        self.fh.write("ReOpen Exists {0}\n".format(os.path.isdir(self.path)))
        self.fh.flush()
        try:
            hDir = win32file.CreateFile(self.path,
                                FILE_LIST_DIRECTORY,
                                win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE | win32con.FILE_SHARE_DELETE,
                                None,
                                win32con.OPEN_EXISTING,
                                win32con.FILE_FLAG_BACKUP_SEMANTICS,
                                None)
        except:
            self.fh.write("Handle is dead\n")
            self.fh.flush()

        try:
            self.fh.write("{0}\n".format(newH))
            self.fh.flush()
        except:
            self.fh.write("Write failed\n")
            self.fh.flush()

        self.fh.write("Check Changes\n")
        self.fh.flush()

        results = win32file.ReadDirectoryChangesW(hDir,
                                                    1024 * 64,
                                                    True,
                                                    win32con.FILE_NOTIFY_CHANGE_FILE_NAME |
                                                    win32con.FILE_NOTIFY_CHANGE_DIR_NAME |
                                                    win32con.FILE_NOTIFY_CHANGE_ATTRIBUTES |
                                                    win32con.FILE_NOTIFY_CHANGE_SIZE |
                                                    win32con.FILE_NOTIFY_CHANGE_LAST_WRITE |
                                                    win32con.FILE_NOTIFY_CHANGE_SECURITY,
                                                    None,
                                                    None)

        # Add all changes to queue
        for action, file in results:

            self.fh.write("Action: {0} on {1}\n".format(action, file))

            out_queue.put((action, time.time(), os.path.join(self.path, file)))

        self.fh.flush()


        #else:


    # Done main loop
    print("Monitoring instance \'{0}\' has finished watching directory: {1}".format(self.name, self.path))

there just seemed to be no way to avoid the call from blocking when the watched directory is removed?

Also as the function is running in a thread, I cannot kill it when deadlocked, from a "supervisor" thread which would monitor the parent directory for DELETE actions on the watched directory and I dont really like that being a good solution as it involves much more code.

ASynchron mode:

I then tried the overlapped mode (async) which does not block in a deadlock, but I can't detect when the directory handle becomes void as the diorectory is deleted. The WaitForSingleObject call just time's out, and checking if the directory is present with os.path.isdir does not help because if the directory is recreated in the mean time, it will not return False, but the old directory handle is still invalid and will not detect the changes in the newly created directory with the same name.

Afer days of trying various approaches, I finnaly got to this code, which however does not work flawlessly bacause it still does not detect the removel of the watched directory and it also does miss a few files when mass deleting files rapidly. A thing which the sync mode did not.

#
# Monitors a directory for changes and pass the changes to the queue
#
def MonitorDirectory(self, out_queue):

    print("Monitoring instance \'{0}\' is watching directory: {1}".format(self.name, self.path))

    # File monitor
    FILE_LIST_DIRECTORY = 0x0001

    overlapped          = pywintypes.OVERLAPPED()
    overlapped.hEvent   = win32event.CreateEvent(None, False, 0, None)

    buffer  = win32file.AllocateReadBuffer(1024 * 64)

    # Main loop to keep watching active
    while not self._shutdown.is_set():

        # Open directory
        try:
            hDir = win32file.CreateFile(self.path,
                                        FILE_LIST_DIRECTORY,
                                        win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE | win32con.FILE_SHARE_DELETE,
                                        None,
                                        win32con.OPEN_EXISTING,
                                        win32con.FILE_FLAG_BACKUP_SEMANTICS | win32con.FILE_FLAG_OVERLAPPED,
                                        None)

        except: 

            # Wait before retry
            time.sleep(1)

        else:

            # Monitor directory for changes
            while not self._shutdown.is_set():

                win32file.ReadDirectoryChangesW(hDir,
                                                buffer,
                                                True,
                                                win32con.FILE_NOTIFY_CHANGE_FILE_NAME |
                                                win32con.FILE_NOTIFY_CHANGE_DIR_NAME |
                                                win32con.FILE_NOTIFY_CHANGE_ATTRIBUTES |
                                                win32con.FILE_NOTIFY_CHANGE_SIZE |
                                                win32con.FILE_NOTIFY_CHANGE_LAST_WRITE |
                                                win32con.FILE_NOTIFY_CHANGE_SECURITY,
                                                overlapped,
                                                None)

                # Wait for the changes
                rc = win32event.WaitForSingleObject(overlapped.hEvent, 10000)

                if rc == win32event.WAIT_OBJECT_0:

                    try:
                        bytes_returned = win32file.GetOverlappedResult(hDir, overlapped, True)

                    except:
                        raise Exception("Error: handle invalid?")

                    else:

                        # Get the changes
                        for action, file in win32file.FILE_NOTIFY_INFORMATION(buffer, bytes_returned):                        
                            out_queue.put((action, time.time(), os.path.join(self.path, file)))

                elif rc == win32event.WAIT_TIMEOUT:
                    print("Monitoring instance \'{0}\': Timeout, no actions")

                else:
                    raise Exception("Error?! RC = {0}".format(rc))

    # Done main loop
    print("Monitoring instance \'{0}\' has finished watching directory: {1}".format(self.name, self.path))

Is there a way to handle the detection of the removal of the watched directory, instead of just removing the win32con.FILE_SHARE_DELETE flag?


Solution

  • Considerations

    Now, a few words on FILE_SHARE_DELETE (could find some doc about it on [MS.Docs]: CreateFileW function):

    The golden rule (or immutable law, if you will) is that user can't really delete a file (or dir) that has open handles.

    Attempting to delete or rename (this seems irrelevant for the current problem, but it isn't) a dir with open handles might have different results (depending on the way that the handles were created and the API used to rename / delete the dir):

    1. Error (ERROR_ACCESS_DENIED) - happens when FILE_SHARE_DELETE wasn't specified (and some other cases)
    2. No error, but the dir is still there - generally means that it was scheduled to be deleted, and will automatically disappear once its last open handle will be closed
    3. Success and the dir is deleted. Actually, that's not true, the dir is just moved (renamed) to "RECYCLE.BIN" (attempting to remove it from there will result in #1.; so would attempting to really delete it in the 1st place (Shift + Del from Explorer))

    I tested the above scenarios trying to delete / rename the dir in various ways:

    I started to investigating ways to fix your problem and I came across [MS.Docs]: GetFinalPathNameByHandleW function (win32file.GetFinalPathNameByHandle). Played with it:

    >>> import sys
    >>> import os
    >>> import win32api
    >>> import win32file
    >>> import win32con
    >>>
    >>> print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
    Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
    
    >>> os.listdir()
    ['code00.py', 'test']
    >>> test_dir = ".\\test"
    >>> os.path.abspath(test_dir)
    'e:\\Work\\Dev\\StackOverflow\\q049652110\\test'
    >>> h = win32file.CreateFile(test_dir, win32con.GENERIC_READ, win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE | win32con.FILE_SHARE_DELETE, None, win32con.OPEN_EXISTING, win32con.FILE_FLAG_BACKUP_SEMANTICS, None)
    >>> h
    <PyHANDLE:620>
    >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED)
    '\\\\?\\E:\\Work\\Dev\\StackOverflow\\q049652110\\test'
    >>> test_dir1 = test_dir + "1"
    >>> os.rename(test_dir, test_dir1)
    >>> os.listdir()
    ['code00.py', 'test1']
    >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED)
    '\\\\?\\E:\\Work\\Dev\\StackOverflow\\q049652110\\test1'
    >>> os.rename(test_dir1, test_dir)
    >>> os.listdir()
    ['code00.py', 'test']
    >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED)
    '\\\\?\\E:\\Work\\Dev\\StackOverflow\\q049652110\\test'
    >>> os.unlink(test_dir)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    PermissionError: [WinError 5] Access is denied: '.\\test'
    >>> # Delete the dir from elsewhere (don't use os.rmdir since that will only schedule the dir for deletion)
    ...
    >>> os.listdir()
    ['code00.py']
    >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED)
    '\\\\?\\E:\\$RECYCLE.BIN\\S-1-5-21-1906798797-2830956273-3148971768-1002\\$RY7SH8D'
    >>> os.mkdir(test_dir)
    >>> os.listdir()
    ['code00.py', 'test']
    >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED)
    '\\\\?\\E:\\$RECYCLE.BIN\\S-1-5-21-1906798797-2830956273-3148971768-1002\\$RY7SH8D'
    >>> os.rmdir(test_dir) # Since the new "test" dir wasn't open, operation successful
    >>> os.listdir()
    ['code00.py']
    >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED)
    '\\\\?\\E:\\$RECYCLE.BIN\\S-1-5-21-1906798797-2830956273-3148971768-1002\\$RY7SH8D'
    >>> # Restore the dir from RECYCLE.BIN
    ...
    >>> os.listdir()
    ['code00.py', 'test']
    >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED)
    '\\\\?\\E:\\Work\\Dev\\StackOverflow\\q049652110\\test'
    >>> os.rmdir(test_dir) # Still an open handle, scheduled to be deleted
    >>> os.listdir()
    ['code00.py', 'test']
    >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED)
    '\\\\?\\E:\\Work\\Dev\\StackOverflow\\q049652110\\test'
    >>> win32api.CloseHandle(h)
    >>> os.listdir()
    ['code00.py'] # After closing the handle the dir was deleted
    >>> h
    <PyHANDLE:0>
    >>> win32file.GetFinalPathNameByHandle(h, win32con.FILE_NAME_NORMALIZED)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    pywintypes.error: (6, 'GetFinalPathNameByHandle', 'The handle is invalid.')
    

    Note: I also tried [MS.Docs]: GetFileInformationByHandle function (win32file.GetFileInformationByHandle), but I couldn't reproduce the behavior, not even with one of the 3 pywintypes.datetime fields (which should be the Last access / modify time); when renaming / deleting the dir, none of the info changed. I didn't spend time to investigate, I thought of 2 possible reasons:

    So, we seem to have a winner. I'm only going to post the algorithm (the code should be fairly simple):

    Other possible approaches (although undesirable):

    Regarding "event loss", as I specified in the other answer, there's no way to be sure that all of them will be processed, there are only ways to minimize the lost ones number.