pythonfor-loopos.walkfile-comparison

How to create a os.walk() function which compares the folders and subfolders of two directories?


here‘s my problem: Let‘s say I want to create a file syncing function that goes through all the folders and subfolders of two similar directories and detects all the common folders/subfolders of those two directories. I gave it a try by combining the os.walk module with the filecmp module. Here´s what my code looks like so far:

import filecmp
import os

src=r"C:\Users\j2the\Documents\Test3"
dst=r"C:\Users\j2the\Documents\Test4"


comparison = filecmp.dircmp(dst, src)

for dirpath,dirnames,filenames in os.walk(src):
    for folders in dirnames:
        if folders in comparison.common_dirs:
            print(folders)
            src_folder=os.path.abspath(os.path.join(dirpath,folders))
            dst_folder=os.path.abspath(os.path.join(dst,folders))
            folder_comparison = filecmp.dircmp(dst_folder, src_folder)

            for dirpath1,dirnames1,filenames1 in os.walk(src_folder):

                for subfolders in dirnames1:
                    if subfolders in folder_comparison.common_dirs:
                        print(subfolders)
                        src_subfolder=os.path.abspath(os.path.join(dirpath1,subfolders))
                        dst_subfodler=os.path.abspath(os.path.join(dst_folder,subfolders))
                        subfolder_comparison=filecmp.dircmp(dst_subfodler,src_subfolder)

It‘s a very simple code. However, this code is only working with directories which have max. 2 subfolders. If I wanted to analyze directories with more subfolders, I would have to add tons of nested loops to my code. Surely there is another way to do that, right? I was thinking about creating a while loop that keeps going through every subfolder and compare them until there‘s no subfolder left, but I simply couldn‘t figure out how to do it. Any help/input would be greatly appreciated!


Solution

  • You don't need filecmp.dircmp. Instead, make two calls to os.walk with the two directories you want to compare, zip the output of the two generators and use set intersection on the two sub-directories from the output to find the common sub-directories.

    Note that the key to making the recursive traversal work is to perform in-place replacement to the sub-directories returned by both generators so that only sub-directories that are common to both of the current directories are retained for deeper traversal and further comparisons:

    import os
    for (root1, dirs1, _), (root2, dirs2, _) in zip(os.walk('dir1'), os.walk('dir2')):
        dirs1[:] = dirs2[:] = set(dirs1).intersection(dirs2)
        for common_dir in dirs1:
          print('Common sub-directory of {} and {}: {}'.format(root1, root2, common_dir))
    

    From the documentation of os.walk:

    When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search...