pythonsamba

Can os.listdir hang with network drives? What system call does it use?


What system call does os.listdir internally perform and is there a possiblity of Python process hanging because of a scenario where os.listdir is over a mounted network drive?

We are suspecting a problem in our app server because of os.listdir which tries to list a samba share mounted on to a linux machine. Apparently DNS of the samba share had changed round about the time we had this issue. We are still trying to replicate this scenario, but can anyone tell me how would it work? And also would commands like ls also hang like this?

Are there any ways we could handle this at user-space?


Solution

  • CPython's implementation of os.listdir uses platform-specific C library calls to read the contents of a directory. On Unix-like platforms those are opendir(3) and readdir(3), and on Windows it uses FindFirstFile and FindNextFile.

    How these calls behave in the presence of unreachable network file systems will depend on the operating system. When using Linux or Windows, they are certain to hang in those situations in which system commands such as ls hang. To prevent arbitrarily long pauses, one can use specialized frameworks, such as asyncio and twisted which make use of non-blocking IO. The use of these frameworks can be daunting, though, and typically requires using them throughout the application and the whole program to event-driven model.

    A simpler and somewhat beginner-friendly way to make sure IO system calls don't block in presence of network file systems is to use threads. As an example, here is a safe_listdir function that returns the directory contents, or None if the call took longer than a specified timeout:

    import os, threading
    
    def safe_listdir(directory, timeout):
        contents = []
        t = threading.Thread(target=lambda: contents.extend(os.listdir(directory)))
        t.daemon = True  # don't delay program's exit
        t.start()
        t.join(timeout)
        if t.is_alive():
            return None  # timeout
        return contents
    

    In Python 3 one could use the excellent concurrent.futures package. It not only simplifies the implementation, it automatically limits the number of created threads if safe_listdir is called many times, and ensures that exceptions raised in os.listdir are correctly propagated to the caller:

    import os, concurrent.futures
    pool = concurrent.futures.ThreadPoolExecutor()
    
    def safe_listdir(directory, timeout):
        future = pool.submit(os.listdir, directory)
        try:
            return future.result(timeout)
        except concurrent.futures.TimeoutError:
            return None  # timeout