pythondatetimefilemtime

Delete duplicate file based on modified time, remaining first created file


First I have video files that record from webcam camera. It will got many file of videos but I want to delete duplicate file base on modification time, limited by minutes.

For example, I have 3 video files as below. base on (hour : minute : second)

  1. Ek001.AVI - time modification of file is 08:30:15
  2. Ek002.AVI - time modification of file is 08:30:40
  3. Ek003.AVI - time modification of file is 08:32:55

I want to get remains output.

  1. Ek001.AVI - time modification of file is 08:30:15 (first file created remaining)
  2. Ek003.AVI

Now I have code for find modification time as below.

import os
import datetime
import glob
from datetime import datetime
      
for file in glob.glob('C:\\Users\\xxx\\*.AVI'):
    time_mod = os.path.getmtime(file)     
    print (datetime.fromtimestamp(time_mod).strftime('%Y-%m-%d %H:%M:%S'),'-->',file)

Please supporting me to adapt my code for delete duplicate file based on modified time, limited by minutes.


Solution

  • Here is my suggested solution. See the comments in the code itself for an detailed explanation, but the basic idea is that you build up a nested dictionary of lists of 2-element tuples, where the keys of the dictionary are the number of minutes since the start of Unix time, and the 2-tuples contain the filename and the remaining seconds. You then loop over the values of the dictionary (lists of tuples for files created within the same calendar minute), sort these by the seconds, and delete all except the first.

    The use of a defaultdict here is just a convenience to avoid the need to explicitly add new lists to the dictionary when looping over files, because these will be added automatically when needed.

    import os
    import glob
    from collections import defaultdict
    
    files_by_minute = defaultdict(list)
    
    # group together all the files according to the number of minutes since the
    # start of Unix time, storing the filename and the number of remaining seconds
    for filename in glob.glob("C:\\Users\\xxx\\*.AVI"):
        time_mod = os.path.getmtime(filename)
        mins = time_mod // 60
        secs = time_mod % 60
        files_by_minute[mins].append((filename, secs))
    
    # go through each of these lists of files, removing the newer ones if
    # there is more than one
    for fileset in files_by_minute.values():
        if len(fileset) > 1:
            # sort tuples by second element (i.e. the seconds)
            fileset.sort(key=lambda t:t[1])
            # remove all except the first
            for file_info in fileset[1:]:
                filename = file_info[0]
                print(f"removing {filename}")
                os.remove(filename)