hashcrc32

Can CRC32 be used as a hash function?


Can CRC32 be used as a hash function? Any drawbacks to this approach? Any tradedeoffs?


Solution

  • CRC32 works very well as a hash algorithm. The whole point of a CRC is to hash a stream of bytes with as few collisions as possible. That said, there are a few points to consider:

    ---- EDIT ----

    Mark Adler provided a link to a useful article for hash evaluation by Bret Mulvey. Using the source code provided in the article, I ran the "bucket test" for both CRC32C and Jenkins96. These tables show the probability that a truly uniform distribution would be worse than the measured result by chance alone. So, higher numbers are better. The author considered 0.05 or lower to be weak and 0.01 or lower to be very weak. I'm entirely trusting the author on all this and am just reporting results.

    I placed an * by all the instances where CRC32C performed better than Jenkins96. By this simple tally, CRC32C was a more uniform hash than Jenkins96 54 of 96 times. Especially if you can use the x86 CRC32 instruction, the speed quality trade-off is excellent.

    CRC32C (0x1EDC6F41)
    
           Uniform keys        Text keys         Sparse keys
    
    Bits  Lower    Upper     Lower    Upper     Lower    Upper
     1    0.671   *0.671    *1.000    0.120    *0.572   *0.572
     2   *0.706   *0.165    *0.729   *0.919     0.277    0.440
     3   *0.878   *0.879    *0.556    0.362    *0.535   *0.542
     4    0.573    0.332     0.433    0.462    *0.855    0.393
     5    0.023   *0.681     0.470    0.907     0.266    0.059
     6   *0.145   *0.523     0.354   *0.172    *0.336    0.588
     7    0.424    0.722     0.172   *0.736     0.184   *0.842
     8   *0.767    0.507    *0.533    0.437     0.337    0.321
     9    0.480    0.725    *0.753   *0.807    *0.618    0.025
    10   *0.719    0.161    *0.970   *0.740    *0.789    0.344
    11   *0.610    0.225    *0.849   *0.814    *0.854   *0.003
    12   *0.979   *0.239    *0.709    0.786     0.171   *0.865
    13   *0.515    0.395     0.192    0.600     0.869   *0.238
    14    0.089   *0.609     0.055   *0.414    *0.286   *0.398
    15   *0.372   *0.719    *0.944    0.100    *0.852   *0.300
    16    0.015   *0.946    *0.467    0.459     0.372   *0.793
    

    And for Jenkins96, which the author of article considered to be an excellent hash:

    Jenkins96
    
          Uniform keys         Text keys         Sparse keys
    
    Bits  Lower    Upper     Lower    Upper     Lower    Upper
     1    0.888    0.572     0.090    0.322     0.090    0.203
     2    0.198    0.027     0.505    0.447     0.729    0.825
     3    0.444    0.510     0.360    0.444     0.467    0.540
     4    0.974    0.783     0.724    0.971     0.439    0.902
     5    0.308    0.383     0.686    0.940     0.424    0.119
     6    0.138    0.505     0.907    0.103     0.300    0.891
     7    0.710    0.956     0.202    0.407     0.792    0.506
     8    0.031    0.552     0.229    0.573     0.407    0.688
     9    0.682    0.990     0.276    0.075     0.269    0.543
    10    0.382    0.933     0.038    0.559     0.746    0.511
    11    0.043    0.918     0.101    0.290     0.584    0.822
    12    0.895    0.036     0.207    0.966     0.486    0.533
    13    0.290    0.872     0.902    0.934     0.877    0.155
    14    0.859    0.568     0.428    0.027     0.136    0.265
    15    0.290    0.420     0.915    0.465     0.532    0.059
    16    0.155    0.922     0.036    0.577     0.545    0.336
    

    ---- EDIT ---- Fixed out-of-date links and minor cleanup.