crccrc64

How likely are two blocks of data likely to produce the same CRC64 value?


I have an caching application that uses a CRC64 value to ensure data integrity. I'm thinking about putting an extra field, a timestamp to be passed around with the data between the various cache servers and compared to see if data has changed.

However, this requires protocol changes. While that's not a huge deal, I already have a CRC64 that could be used as an indicator that something has changed.

Does anyone know the stats around two blocks of data producing the same CRC64? If not, how could I compute it or estimate it's likelyhood?


Solution

  • If you assume that crc64 is 'perfect', then the numbers are pretty reasonable:

    For a 1% probability of collision, you need 6.1 × 10^8 entries. For a 50% probability of collision, you need 5.1 × 10^9 entries.

    Of course, if the data is potentially supplied by malicious sources, then collisions in a hash as simple as crc64 can be generated easily, and collisions could be rampant. So whether or not you go this route depends on the source of input data and the potential ramifications of collisions.