sql-serverhashmd5sqldatatypeshashbytes

Hashes Are Not Matching


I am hashing the same values, but am getting different results.

Here is a simplified example to explain what I am encountering:

I have a dimension table that looks like this:

gender gender_id
Male 1
Female 0

The datatype for gender is NVARCHAR(6) and gender_id is INT

When I execute any of the following queries I get the same hash:

**Scenario 1:**
SELECT 
       CONVERT(BINARY(20), HASHBYTES('Md5', Concat(Gender, cast(gender_id as int))))
FROM demographic
WHERE gender = 'Male';

OUTPUT: '0x6B216D8BB993AA263265CCF645C282B100000000'

**Scenario 2:**
    SELECT  
           CONVERT(BINARY(20), HASHBYTES('Md5', Concat(Gender, CAST(gender_id AS NVARCHAR(1)))))
    FROM demographic
    WHERE gender = 'Male';

OUTPUT: '0x6B216D8BB993AA263265CCF645C282B100000000'

In scenario 1, I casted gender_id as an INT and in scenario 2 I casted gender_id as NVARCHAR. In both cases the hash was the same.

When I execute queries calling on the specific values in my dimension, instead of the columns, my hashes are different:

 **Scenario 3:**
    SELECT CONVERT(BINARY(20), HASHBYTES('MD5', Concat('Male', CAST(1 as INT))));

OUTPUT: '0x048A5F0EE2D2B4070CFF8A38CB6DAC7100000000'

**Scenario 4:**
    SELECT CONVERT(BINARY(20), HASHBYTES('MD5', Concat('Male', CAST(1 as NVARCHAR(1)))));

OUTPUT: '0x6B216D8BB993AA263265CCF645C282B100000000'

In scenario 3 I casted 1 as an INT just as I did in scenario 1. In scenario 4, I casted 1 as NVARCHAR just as I did in scenario 2. Yet, scenario 3 and 4 have different hashes. In addition to this the hash for scenario 4 is consistent with the hashes in scenarios 1 and 2.

I am having trouble understanding why the hashes are the same for scenario 1, 2, and 4, but different for scenario 3. In my dimension, gender_id is an INT. When I query my dimension, no matter how I cast it, the hash is always the same. In scenarios 3 and 4, when I replace the column names with the actual values the results change. The hash in scenario 3 will not match scenarios 1 and 2 unless I cast it as NVARCHAR. Why is this since gender_id is naturally an INT?

Would appreciate any insight you have, and would be happy to provide more clarification if needed.

Thanks!


Solution

  • 'Male' is an ANSI varchar value, not a Unicode nvarchar. For Unicode you need N'Male'

    So for example, I get the following result for this query:

    select convert(varchar(20), cast('Male' as varbinary(20)), 1)
    

    0x4D616C65

    Whereas this one, note the addition of N before the literal:

    select convert(varchar(20), cast(N'Male' as varbinary(20)), 1)
    

    0x4D0061006C006500