linuxbashcmp

bash / cmp: compare two consecutive jpg. files with same size of a long list


I'm trying to apply the cmp command to a number of consecutive jpg files with same size but different name, in order to make sure they are the indeed same. Since there are almost 4000 files, I would like to create a for loop through them with cmp and produce a final output with the list of actual same files, but so far I haven't been able to.

This is a sample of the file list:

-rw-r--r-- 1 giu_  1094433 dic 30 09:12 IMG_0199.JPG  
-rw-r--r-- 1 giu_  1094433 lug 30  2016 img_0199_28043673584_o.jpg  
-rw-r--r-- 1 giu_  1124837 dic 30 09:12 IMG_0103.JPG  
-rw-r--r-- 1 giu_  1124837 lug 30  2016 img_0103_28045527533_o.jpg  
-rw-r--r-- 1 giu_  1174143 ago 12  2016 img_1520_28906930111_o.jpg  
-rw-r--r-- 1 giu_  1174143 dic 30 12:33 IMG_1520.JPG  
-rw-r--r-- 1 giu_  1227753 dic 30 09:12 IMG_0104.JPG  
-rw-r--r-- 1 giu_  1227753 lug 30  2016 img_0104_28044608674_o.jpg  

Solution

  • Unless this is a coding exercise (in which case my recommendation is not applicable), look into fdupes. It does exactly what you want.

    FDUPES(1)                         General Commands Manual                        FDUPES(1)
    
    NAME
           fdupes - finds duplicate files in a given set of directories
    
    SYNOPSIS
           fdupes [ options ] DIRECTORY ...
    
    DESCRIPTION
           Searches the given path for duplicate files. Such files are found by comparing file
           sizes and MD5 signatures, followed by a byte-by-byte comparison.