cryptographyhashanonymize

Which algorithm for hashing name, firstName and birth-date of a person


I have to save the combination of lastname, firstname and birth-date of a person as a hash. This hash is later used to search for the same person with the exactly same properties. My question is, if SHA-1 is a meaningfull algorithm for this.

As far as I understand SHA-1, there is virtually no possibility that two different persons (with different attributes) will ever get the same hash-value. Is this right?


Solution

  • If you want to search for a person knowing only those credentials, you could store the SHA-1 in the database(or MD5 for speed, unless you have like a quadrillion people to sample).

    The hash will be worthless, as it stores no information about the person, but it can work for searching a database. You just want to make sure that the three pieces of information match, so it would be safe to just concatenate them:

    user.hash = SHA1(user.firstName + user.DOB + user.lastName)
    

    And when you query, you could check if the two match:

    hash = SHA1(query.firstName + query.DOB + query.lastName)
    
    for user in database:
      if user.hash == hash:
        return user
    

    I put query.DOB in the middle because the first and last name might collide, like if JohnDoe Bob was born on the same day as John DoeBob. I'm not aware of numeric names, so I think this will stop collisions like those ;)

    But if this is a big database, I'd try MD5. It's faster, but there is a chance of a collision (in your case, I can guarantee that one won't occur). The chance of a collision, however, is really small.

    To put that into perspective, a collision is a 1 / 2^128 occurrence, which is:

                              1
    ---------------------------------------------------
    340,282,366,920,938,463,463,374,607,431,768,211,456
    

    And that's a little smaller than:

    0.0000000000000000000000000000000000000293873 %
    

    I'm pretty sure you're not going to get a collision ;)