pythonpathfilesystemsglobfnmatch

How to use to find files recursively?


I would like to list all files recursively in a directory. I currently have a directory structure like this:

I've tried to do the following:

from glob import glob

glob(os.path.join('src','*.c'))

But this will only get be files directly in the src subfolder, e.g. I get main.c but I will not get file1.c, file2.c etc.

from glob import glob

glob(os.path.join('src','*.c'))
glob(os.path.join('src','*','*.c'))
glob(os.path.join('src','*','*','*.c'))
glob(os.path.join('src','*','*','*','*.c'))

But this is obviously limited and clunky, how can I do this properly?


Solution

  • There are a couple of ways:

    pathlib.Path().rglob()

    Use pathlib.Path().rglob() from the pathlib module, which was introduced in Python 3.5.

    from pathlib import Path
    
    for path in Path('src').rglob('*.c'):
        print(path.name)
    

    glob.glob()

    If you don't want to use pathlib, use glob.glob():

    from glob import glob
    
    for filename in glob('src/**/*.c', recursive=True):
        print(filename)   
    

    For cases where matching files beginning with a dot (.); like files in the current directory or hidden files on Unix based system, use the os.walk() solution below.

    os.walk()

    For older Python versions, use os.walk() to recursively walk a directory and fnmatch.filter() to match against a simple expression:

    import fnmatch
    import os
    
    matches = []
    for root, dirnames, filenames in os.walk('src'):
        for filename in fnmatch.filter(filenames, '*.c'):
            matches.append(os.path.join(root, filename))
    

    This version should also be faster depending on how many files you have, as the pathlib module has a bit of overhead over os.walk().