I am in the process of creating demo data from data that contains Patient History Information (PHI). There are a few columns where I just want to generate a random value that stays consistent throughout the data. For example, there is a field like SSN that I want to create a random 9 number digit for each unique SSN, but keeping this number the same where the claims are of the same individual. So 1 SSN might have 5 claims and each claim will have the same randomly created SSN.
sample
ssn date1 procedure
443234432 1/1/2019 needle poke
443234432 1/2/2019 needle poke
676343522 1/3/2019 total knee procedure
443234432 1/4/2019 total hip procedure
676343522 1/5/2019 needle poke
final
ssn date1 procedure
856345544 1/1/2019 needle poke
856345544 1/2/2019 needle poke
979583338 1/3/2019 total knee procedure
856345544 1/4/2019 total hip procedure
979583338 1/5/2019 needle poke
As you can see, the snn changed, but stays the same for all instances where the ssn was the same.
For numbers like this, I can convert to a numeric and multiply/divide/add/subtract to create a random number that maintains integrity, but how can I handle this for instances where there are non-numerics?
I attempted to use HASHBYTES but am getting lots of strange characters. Is there another method that could generate a random value and maintain consistency throughout the dataset?
RIGHT(convert(NVARCHAR(10), HASHBYTES('MD5', SSN)),10) as SSN
RESULTS:
댛량뇟㻣砖聋蠤
I've read a number of articles about this, but I couldn't find much about maintaining consistency across multiple claims. I appreciate any feedback.
I think you want printable characters. In that case, you can use the CONVERT
function to translate the bytes result of a HASHBYTES
to a hex representation as string. Just make sure to pass the value 2
as the third parameter.
DECLARE @SomeValue VARCHAR(100) = CONVERT(VARCHAR(100), NEWID())
SELECT
@SomeValue AS Original,
CONVERT(
VARCHAR(20),
HASHBYTES('MD5', @SomeValue),
2) AS Scrambled
A few results:
Original Scrambled
BC9EC2E0-2009-45FA-AA95-64585B815BD9 A33AEBC011E9188EB97E
6FF7E0FE-E054-49D7-A451-80111BF5B200 94F93C6A5CBD0E56C70B
C8F8CD77-96B7-4B74-84B7-4EB3412C6CE7 2994341068CE8C4E1EF9
Put the length you want as the varchar target in the first parameter.
Please be advised that hash functions might generate same result on different inputs, and it will specially if you are truncating the result to the first N characters.