here‘s my problem: Let‘s say I want to create a file syncing function that goes through all the folders and subfolders of two similar directories and detects all the common folders/subfolders of those two directories. I gave it a try by combining the os.walk module with the filecmp module. Here´s what my code looks like so far:
import filecmp
import os
src=r"C:\Users\j2the\Documents\Test3"
dst=r"C:\Users\j2the\Documents\Test4"
comparison = filecmp.dircmp(dst, src)
for dirpath,dirnames,filenames in os.walk(src):
for folders in dirnames:
if folders in comparison.common_dirs:
print(folders)
src_folder=os.path.abspath(os.path.join(dirpath,folders))
dst_folder=os.path.abspath(os.path.join(dst,folders))
folder_comparison = filecmp.dircmp(dst_folder, src_folder)
for dirpath1,dirnames1,filenames1 in os.walk(src_folder):
for subfolders in dirnames1:
if subfolders in folder_comparison.common_dirs:
print(subfolders)
src_subfolder=os.path.abspath(os.path.join(dirpath1,subfolders))
dst_subfodler=os.path.abspath(os.path.join(dst_folder,subfolders))
subfolder_comparison=filecmp.dircmp(dst_subfodler,src_subfolder)
It‘s a very simple code. However, this code is only working with directories which have max. 2 subfolders. If I wanted to analyze directories with more subfolders, I would have to add tons of nested loops to my code. Surely there is another way to do that, right? I was thinking about creating a while loop that keeps going through every subfolder and compare them until there‘s no subfolder left, but I simply couldn‘t figure out how to do it. Any help/input would be greatly appreciated!
You don't need filecmp.dircmp
. Instead, make two calls to os.walk
with the two directories you want to compare, zip
the output of the two generators and use set intersection on the two sub-directories from the output to find the common sub-directories.
Note that the key to making the recursive traversal work is to perform in-place replacement to the sub-directories returned by both generators so that only sub-directories that are common to both of the current directories are retained for deeper traversal and further comparisons:
import os
for (root1, dirs1, _), (root2, dirs2, _) in zip(os.walk('dir1'), os.walk('dir2')):
dirs1[:] = dirs2[:] = set(dirs1).intersection(dirs2)
for common_dir in dirs1:
print('Common sub-directory of {} and {}: {}'.format(root1, root2, common_dir))
From the documentation of os.walk
:
When
topdown
isTrue
, the caller can modify thedirnames
list in-place (perhaps usingdel
or slice assignment), andwalk()
will only recurse into the subdirectories whose names remain indirnames
; this can be used to prune the search...