Suppose I have following simplified files structure
main_folder
|__ foo.json
|
|__ sub_folder
|__bar.json
I have two copies of the main_folder
, e.g. main_folder_v1
and main_folder_v2
I want to compare both versions and get names of all files that differs (for example, get "foo.json"
in case its content was updated in main_folder_v2
)
And I use below code
import filecmp
comparison_result = filecmp.dircmp(main_folder_v1, main_folder_v2)
files_that_differs = comparison_result.diff_files
The problem is that I will get ["foo.json"]
in case it was updated in main_folder_v2
, but I will never get ["bar.json"]
as it seem that comparison of files in sub_folder
not performed
Is there any possibility to compare folders recursively using filecmp and get names of files that differs or os.walk()
is the only solution?
[Python]: filecmp - File and Directory Comparisons supports recursive traversing via dircmp.subdirs
. No need for os.walk
(or any other similar functions).
code.py:
import sys
import filecmp
import os
main_folder_v1 = "dir_v1"
main_folder_v2 = "dir_v2"
ROOT_DIR_MARKER = ""
def traverse_dircmp(dircmp_obj, dir_name=ROOT_DIR_MARKER):
for item in dircmp_obj.diff_files:
yield os.path.join(dir_name, item)
for subdir_name in dircmp_obj.subdirs:
yield from traverse_dircmp(dircmp_obj.subdirs[subdir_name], dir_name=os.path.join(dir_name, subdir_name))
#for item in traverse_dircmp(dircmp_obj.subdirs[subdir_name], dir_name=os.path.join(dir_name, subdir_name)):
# yield item
def traverse_dircmp_list(dircmp_obj, dir_name=ROOT_DIR_MARKER):
ret = [os.path.join(dir_name, item) for item in dircmp_obj.diff_files]
for subdir_name in dircmp_obj.subdirs:
ret.extend(traverse_dircmp_list(dircmp_obj.subdirs[subdir_name], dir_name=os.path.join(dir_name, subdir_name)))
return ret
def main():
comparison_object = filecmp.dircmp(main_folder_v1, main_folder_v2)
comparison_result = traverse_dircmp(comparison_object)
print("{:s}: {:}".format("Different files (gen)", list(comparison_result)))
comparison_result_list = traverse_dircmp_list(comparison_object)
print("{:s}: {:}".format("Different files (list)", comparison_result_list))
if __name__ == "__main__":
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
main()
Output (for a dir structure similar to yours):
(py35x64_test) e:\Work\Dev\StackOverflow\q050157870>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" code.py Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32 Different files (gen): ['foo.json', 'subdir00\\bar.json', 'subdir00\\subdir001\\x.json'] Different files (list): ['foo.json', 'subdir00\\bar.json', 'subdir00\\subdir001\\x.json']
@EDIT0:
traverse_dircmp
function to return the list of files, instead of printing them, as requested in one of the comments@EDIT1:
yield from
statement can be replaced by the 2 commented (for
and yield
) lines below it)