pythonpython-3.xfile-comparison

Compare two versions of the same folder and get names of files that differs


Suppose I have following simplified files structure

main_folder
    |__ foo.json
    |
    |__ sub_folder
          |__bar.json

I have two copies of the main_folder, e.g. main_folder_v1 and main_folder_v2

I want to compare both versions and get names of all files that differs (for example, get "foo.json" in case its content was updated in main_folder_v2)

And I use below code

import filecmp

comparison_result = filecmp.dircmp(main_folder_v1, main_folder_v2)
files_that_differs = comparison_result.diff_files

The problem is that I will get ["foo.json"] in case it was updated in main_folder_v2, but I will never get ["bar.json"] as it seem that comparison of files in sub_folder not performed

Is there any possibility to compare folders recursively using filecmp and get names of files that differs or os.walk() is the only solution?


Solution

  • [Python]: filecmp - File and Directory Comparisons supports recursive traversing via dircmp.subdirs. No need for os.walk (or any other similar functions).

    code.py:

    import sys
    import filecmp
    import os
    
    
    main_folder_v1 = "dir_v1"
    main_folder_v2 = "dir_v2"
    
    ROOT_DIR_MARKER = ""
    
    
    def traverse_dircmp(dircmp_obj, dir_name=ROOT_DIR_MARKER):
        for item in dircmp_obj.diff_files:
            yield os.path.join(dir_name, item)
        for subdir_name in dircmp_obj.subdirs:
            yield from traverse_dircmp(dircmp_obj.subdirs[subdir_name], dir_name=os.path.join(dir_name, subdir_name))
            #for item in traverse_dircmp(dircmp_obj.subdirs[subdir_name], dir_name=os.path.join(dir_name, subdir_name)):
            #    yield item
    
    
    def traverse_dircmp_list(dircmp_obj, dir_name=ROOT_DIR_MARKER):
        ret = [os.path.join(dir_name, item) for item in dircmp_obj.diff_files]
        for subdir_name in dircmp_obj.subdirs:
            ret.extend(traverse_dircmp_list(dircmp_obj.subdirs[subdir_name], dir_name=os.path.join(dir_name, subdir_name)))
        return ret
    
    
    def main():
        comparison_object = filecmp.dircmp(main_folder_v1, main_folder_v2)
    
        comparison_result = traverse_dircmp(comparison_object)
        print("{:s}: {:}".format("Different files (gen)", list(comparison_result)))
    
        comparison_result_list = traverse_dircmp_list(comparison_object)
        print("{:s}: {:}".format("Different files (list)", comparison_result_list))
    
    
    if __name__ == "__main__":
        print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
        main()
    

    Output (for a dir structure similar to yours):

    (py35x64_test) e:\Work\Dev\StackOverflow\q050157870>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" code.py
    Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
    
    Different files (gen): ['foo.json', 'subdir00\\bar.json', 'subdir00\\subdir001\\x.json']
    Different files (list): ['foo.json', 'subdir00\\bar.json', 'subdir00\\subdir001\\x.json']
    

    @EDIT0:

    @EDIT1: