I want to quickly compute hash values of large files on disk to compare them to one another.
I'm using the following:
import hashlib
def sha256sum(filename):
with open(filename, 'rb', buffering=0) as f:
return hashlib.file_digest(f, 'sha256').hexdigest()
But I'd like to use xxhash
since I hear it's faster. This doesn't work:
import hashlib
def xxhashsum(filename):
with open(filename, 'rb', buffering=0) as f:
return hashlib.file_digest(f, 'xxhash').hexdigest()
Is there a version that would?
Try like this:
import xxhash
import hashlib
def xxhashsum(filename, algo="xxh128"):
if algo not in xxhash.algorithms_available:
raise NotImplementedError
digest = getattr(xxhash, algo)
with open(filename, 'rb', buffering=0) as f:
return hashlib.file_digest(f, digest).hexdigest()