I am trying to save files with their MD5 as filename, in order to do this I am generating images in a Numpy Array, sometimes can be the same images, so I want to calculate MD5 in order to overwrite existing images or avoid saving.
The problem is that the hash that I get from NumPy array is not the same as the image saved finally, to do this I am using the following code:
hashlib.md5(array.astype("uint8")).hexdigest()
Is possible to calculate md5 hash from NumPy array, or do I need to save it with a random name and rename it after?
Thanks
Following the comment and based upon the assumption that you are saving a numpy array, and not an image file, you could just do:
hash = hashlib.md5(array.tobytes()).digest()
np.save(hash, array)
If you instead have to save the image, you should, in order:
In code:
import hashlib
import os
from matplotlib.image import imsave
import binascii
imsave('myimage.jpg', image_array)
with open('myimage.jpg','rb') as f:
ba = f.read()
_hash = hashlib.md5(ba).digest()
new_filename = binascii.hexlify(_hash).decode()+'.jpg'
if os.path.exists(new_filename):
os.remove(new_filename)
os.rename('myimage.jpg',new_filename)
Please, avoid doing so, as @Mark commented below, here replicated:
You are calculating the md5 digest of a JPEG-compressed file so you will likely not detect if it corresponds to another identical Numpy array if 1) the JPEG is wriiten by a different library, or 2) different version of the same library, or 3) with different quality settings or 4) on a different date if the date is embedded in the metadata, or 5) to a different image format such as PNG