performancehashshamurmurhash

Fast hash function with collision possibility near SHA-1


I'm using SHA-1 to detect duplicates in a program handling files. It is not required to be cryptographic strong and may be reversible. I found this list of fast hash functions https://code.google.com/p/xxhash/ (list has been moved to https://github.com/Cyan4973/xxHash)

What do I choose if I want a faster function and collision on random data near to SHA-1?

Maybe a 128 bit hash is good enough for file deduplication? (vs 160 bit sha-1)

In my program the hash is calculated on chuncks from 0 - 512 KB.


Solution

  • Maybe this will help you: https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

    collisions rare: FNV-1, FNV-1a, DJB2, DJB2a, SDBM & MurmurHash

    I don't know about xxHash but it looks also promising.

    MurmurHash is very fast and version 3 supports 128bit length, I would choose this one. (Implemented in Java and Scala.)