image-processinghashimage-comparisonfingerprinting

Image Comparison by Finger Printing


I'm looking for ways to find image duplicates by fingerprinting. I understand that this is done by applying hash functions on images, and each image would have a unique hash value.

I am fairly new to image processing and don't know much about hashing. How exactly am I supposed to apply hash functions and generate hash values?

Thanks in advance


Solution

  • You need to be careful with hashing, some image formats, such as JPEG and PNG, store dates/times and other information within images and that will make two identical images appear to be different to normal tools such as md5 and cksum.

    Here is an example. Make two images, both identical red squares of 128x128 at the command line in Terminal with ImageMagick

    convert -size 128x128 xc:red a.png
    convert -size 128x128 xc:red b.png
    

    enter image description here enter image description here

    Now check their MD5 sums:

    md5 [ab].png
    MD5 (a.png) = b4b82ba217f0b36e6d3ba1722f883e59
    MD5 (b.png) = 6aa398d3aaf026c597063c5b71b8bd1a
    

    Or their checksums:

    cksum [ab].png
    4158429075 290 a.png
    3657683960 290 b.png
    

    Oops, they are different according to both md5 and cksum. Why? Because the dates are 1 second apart.

    I would suggest you use ImageMagick to checksum "just the image data" and not the metadata - unless, of course, the date is important to you:

    identify -format %# a.png
    e74164f4bab2dd8f7f612f8d2d77df17106bac77b9566aa888d31499e9cf8564
    
    identify -format %# b.png
    e74164f4bab2dd8f7f612f8d2d77df17106bac77b9566aa888d31499e9cf8564
    

    Now they are both the same, because the image is the same - just the metadata differs.

    Of course, you may be more interested in "Perceptual Hashing" where you just get an idea if two images "look similar". If so, look here.

    Or you may be interested in allowing slight differences in brightness, or orientation, or cropping - which is another topic altogether.