sqlsql-serverscramblehashbytes

How to scramble or hash values in SQL server?


I am in the process of creating demo data from data that contains Patient History Information (PHI). There are a few columns where I just want to generate a random value that stays consistent throughout the data. For example, there is a field like SSN that I want to create a random 9 number digit for each unique SSN, but keeping this number the same where the claims are of the same individual. So 1 SSN might have 5 claims and each claim will have the same randomly created SSN.

sample

ssn           date1       procedure
443234432     1/1/2019    needle poke
443234432     1/2/2019    needle poke
676343522     1/3/2019    total knee procedure
443234432     1/4/2019    total hip procedure
676343522     1/5/2019    needle poke

final

ssn           date1       procedure
856345544     1/1/2019    needle poke
856345544     1/2/2019    needle poke
979583338     1/3/2019    total knee procedure
856345544     1/4/2019    total hip procedure
979583338     1/5/2019    needle poke

As you can see, the snn changed, but stays the same for all instances where the ssn was the same.

For numbers like this, I can convert to a numeric and multiply/divide/add/subtract to create a random number that maintains integrity, but how can I handle this for instances where there are non-numerics?

I attempted to use HASHBYTES but am getting lots of strange characters. Is there another method that could generate a random value and maintain consistency throughout the dataset?

RIGHT(convert(NVARCHAR(10), HASHBYTES('MD5', SSN)),10) as SSN

RESULTS:
댛량뇟㻣砖聋蠤

I've read a number of articles about this, but I couldn't find much about maintaining consistency across multiple claims. I appreciate any feedback.


Solution

  • I think you want printable characters. In that case, you can use the CONVERT function to translate the bytes result of a HASHBYTES to a hex representation as string. Just make sure to pass the value 2 as the third parameter.

    DECLARE @SomeValue VARCHAR(100) = CONVERT(VARCHAR(100), NEWID())
    
    SELECT
        @SomeValue AS Original,
        CONVERT(
            VARCHAR(20), 
            HASHBYTES('MD5', @SomeValue), 
            2) AS Scrambled
    

    A few results:

    Original                                Scrambled
    BC9EC2E0-2009-45FA-AA95-64585B815BD9    A33AEBC011E9188EB97E
    6FF7E0FE-E054-49D7-A451-80111BF5B200    94F93C6A5CBD0E56C70B
    C8F8CD77-96B7-4B74-84B7-4EB3412C6CE7    2994341068CE8C4E1EF9
    

    Put the length you want as the varchar target in the first parameter.

    Please be advised that hash functions might generate same result on different inputs, and it will specially if you are truncating the result to the first N characters.