pythonfile-iodirectoryperformance

A Faster way of Directory walking instead of os.listdir?


I am trying to improve performance of elfinder , an ajax based file manager(elRTE.ru) .

It uses os.listdir in a recurisve to walk through all directories recursively and having a performance hit (like listing a dir with 3000 + files takes 7 seconds ) ..

I am trying to improve performance for it here is it's walking function:

        for d in os.listdir(path):
            pd = os.path.join(path, d)
            if os.path.isdir(pd) and not os.path.islink(pd) and self.__isAccepted(d):
                tree['dirs'].append(self.__tree(pd))

My questions are :

  1. If i change os.walk instead of os.listdir , would it improve performance?
  2. how about using dircache.listdir() ? cache WHOLE directory/subdir contents at the initial request and return cache results , if theres no new files uploaded or no changes in file?
  3. Is there any other method of Directory walking which is faster?
  4. Any Other Server Side file browser which is fast written in python (but i prefer to make this one fast)?

Solution

  • I was just trying to figure out how to speed up os.walk on a largish file system (350,000 files spread out within around 50,000 directories). I'm on a linux box usign an ext3 file system. I discovered that there is a way to speed this up for MY case.

    Specifically, Using a top-down walk, any time os.walk returns a list of more than one directory, I use os.stat to get the inode number of each directory, and sort the directory list by inode number. This makes walk mostly visit the subdirectories in inode order, which reduces disk seeks.

    For my use case, it sped up my complete directory walk from 18 minutes down to 13 minutes...