I have to save the combination of lastname, firstname and birth-date of a person as a hash. This hash is later used to search for the same person with the exactly same properties. My question is, if SHA-1 is a meaningfull algorithm for this.
As far as I understand SHA-1, there is virtually no possibility that two different persons (with different attributes) will ever get the same hash-value. Is this right?
If you want to search for a person knowing only those credentials, you could store the SHA-1 in the database(or MD5 for speed, unless you have like a quadrillion people to sample).
The hash will be worthless, as it stores no information about the person, but it can work for searching a database. You just want to make sure that the three pieces of information match, so it would be safe to just concatenate them:
user.hash = SHA1(user.firstName + user.DOB + user.lastName)
And when you query, you could check if the two match:
hash = SHA1(query.firstName + query.DOB + query.lastName)
for user in database:
if user.hash == hash:
return user
I put query.DOB
in the middle because the first and last name might collide, like if JohnDoe Bob
was born on the same day as John DoeBob
. I'm not aware of numeric names, so I think this will stop collisions like those ;)
But if this is a big database, I'd try MD5. It's faster, but there is a chance of a collision (in your case, I can guarantee that one won't occur). The chance of a collision, however, is really small.
To put that into perspective, a collision is a 1 / 2^128
occurrence, which is:
1
---------------------------------------------------
340,282,366,920,938,463,463,374,607,431,768,211,456
And that's a little smaller than:
0.0000000000000000000000000000000000000293873 %
I'm pretty sure you're not going to get a collision ;)