I have an archive file in ubuntu server. I uploaded this file in AWS glacier using aws cli
. at the finishing, AWS gave me a checksum like this:
{"checksum": "6c126443c882b8b0be912c91617a5765050d7c99dc43b9d30e47c42635ab02d5"}
but when i checked the checksum in own server like this:
sunny@server:~/sha256sum backup.zip
return this checksum:
5ba29292a350c4a8f194c78dd0ef537ec21ca075f1fe649ae6296c7100b25ba8
why between checksums has a difference?
While the checksum returned by Glacier uses SHA-256, it is not a simple SHA-256 sum over the entire object. Rather, it calculates hashes for each megabyte of data, and calculates a hash for each pair of hashes, and repeats the process till one hash remains. For more information, see the documentation.
Here's is a simple implementation in Python
#!/usr/bin/env python3
import hashlib
import sys
import binascii
# Given a file object (opened in binary mode), calculate the checksum used by glacier
def calc_hash_tree(fileobj):
chunk_size = 1048576
# Calculate a list of hashes for each chunk in the fileobj
chunks = []
while True:
chunk = f.read(chunk_size)
if len(chunk) == 0:
break
chunks.append(hashlib.sha256(chunk).digest())
# Now calculate each level of the tree till one digest remains
while len(chunks) > 1:
next_chunks = []
while len(chunks) > 1:
next_chunks.append(hashlib.sha256(chunks.pop(0) + chunks.pop(0)).digest())
if len(chunks) > 0:
next_chunks.append(chunks.pop(0))
chunks = next_chunks
# The final remaining hash is the root of the tree:
return binascii.hexlify(chunks[0]).decode("utf-8")
if __name__ == "__main__":
with open(sys.argv[1], "rb") as f:
print(calc_hash_tree(f))
You can call it on a single file like this:
$ ./glacier_checksum.py backup.zip