pythonnumpypython-imaging-libraryhashlib

is it possible calculate md5 of numpy image before to save?


I am trying to save files with their MD5 as filename, in order to do this I am generating images in a Numpy Array, sometimes can be the same images, so I want to calculate MD5 in order to overwrite existing images or avoid saving.

The problem is that the hash that I get from NumPy array is not the same as the image saved finally, to do this I am using the following code:

hashlib.md5(array.astype("uint8")).hexdigest()

Is possible to calculate md5 hash from NumPy array, or do I need to save it with a random name and rename it after?

Thanks


Solution

  • Following the comment and based upon the assumption that you are saving a numpy array, and not an image file, you could just do:

    hash = hashlib.md5(array.tobytes()).digest()
    np.save(hash, array)
    

    Highly INADVISABLE what follows!

    If you instead have to save the image, you should, in order:

    1. Save the image (.png, for example)
    2. Digest the file content with hashlib
    3. Delete existing image, if any
    4. Rename your new image

    In code:

    import hashlib
    import os
    from matplotlib.image import imsave
    import binascii
    imsave('myimage.jpg', image_array)
    with open('myimage.jpg','rb') as f:
        ba = f.read()
    _hash = hashlib.md5(ba).digest()
    new_filename = binascii.hexlify(_hash).decode()+'.jpg'
    if os.path.exists(new_filename):
        os.remove(new_filename)
    os.rename('myimage.jpg',new_filename)
    

    Please, avoid doing so, as @Mark commented below, here replicated:

    You are calculating the md5 digest of a JPEG-compressed file so you will likely not detect if it corresponds to another identical Numpy array if 1) the JPEG is wriiten by a different library, or 2) different version of the same library, or 3) with different quality settings or 4) on a different date if the date is embedded in the metadata, or 5) to a different image format such as PNG