pythonos.walk

Find leaf folders that aren't hidden folders


I have a folder structure with some epubs and json files in the down-most folders (not counting the .ts folders). I'm exporting tags from the json files to tagspaces, by creating a .ts folder with some other json files. I've already processed part of the files and now I want to find the leaf folders that don't have a .ts folder in their path, to find the remaining files without having to process the others twice.

So for this example I only want to do something for the folder t5:

test
├── t1
│   ├── t2
│   │   └── t5
│   └── t3
│       └── .ts
└── .ts
    └── t4

This is what I've tried:

def process_files_in_leaf_subdirectories(dir: str) -> None:
    dirs = []
    for root, subdirs, filenames in os.walk(dir):
        if subdirs or '.ts' in root:
            continue
        dirs.append(root)
    return dirs


def test_process_files_in_leaf_subdirectories():
    os.makedirs('tmp/t1/t2/t5', exist_ok=True)
    os.makedirs('tmp/t1/t3/.ts', exist_ok=True)
    os.makedirs('tmp/.ts/t4', exist_ok=True)
    assert get_files_in_leaf_subdirectories('tmp') == ['tmp/t1/t2/t5']
    shutil.rmtree('tmp')

context


Solution

  • Since you want to find leaf directory, without counting .ts directory - just recursively visiting non-hidden path and yielding directories without any subdirectory would be enough.

    For such path operations in python, I'd recommend using pathlib.Path instead.

    Here's generator to yield leaf directories without any subdir:

    import pathlib
    from typing import Generator
    
    
    def find_leaf_dir_gen(root_path: pathlib.Path) -> Generator[pathlib.Path, None, None]:
    
        # filter subdirectories
        child_dirs = [path for path in root_path.iterdir() if path.is_dir()]
    
        # if no child_dir, yield & return
        if not child_dirs:
            yield root_path
            return
        
        # otherwise iter tru subdir
        for path in child_dirs:
            # ignore hidden dir
            if path.stem[0] == ".":
                continue
    
            # step in and recursive yield
            yield from find_leaf_dir_gen(path)
    

    Sample usage

    >>> leaves = list(find_leaf_dir_gen(ROOT))
    >>> leaves
    [WindowsPath('X:/test/t1/t2/t5'), WindowsPath('X:/test/t1/t3/t6')]
    
    >>> for path in leaves:
    ...     ts_path = path.joinpath(".ts")
    ...     ts_path.mkdir()
    

    Test directory structure - Before:

    X:\TEST
    ├─.ts
    │  └─t4
    └─t1
        ├─t2
        │  └─t5
        └─t3
            ├─.ts
            └─t6
    

    After:

    X:\TEST
    ├─.ts
    │  └─t4
    └─t1
        ├─t2
        │  └─t5
        │      └─.ts
        └─t3
            ├─.ts
            └─t6
                └─.ts