To put it simply i have a very large database
with hundreds of thousands of entries and hundreds of different columns.
Some of those columns need to be hashed in order to save space, etc.. However when i try to hash them like this:
select distinct
columnA + hashbytes('sha1', [Column_in_question])
from [dbo].[Tabled_in_question]
I end up with more rows than if i just did this:
select distinct
columnA + [Column_in_question]
from [dbo].[Tabled_in_question]
My best guess is that the select distinct is not case sensitive, whereas Hashbytes is. But i don't really know how i can test this or fix it.
Any ideas?
you are right the difference is the case sensitivity
you can check it using
select distinct
convert(VARBINARY(10), [Column_in_question]),
columnA + hashbytes('sha1', [Column_in_question])
from [dbo].[Tabled_in_question]
the collation of db is most probably CI (case insensitive) but hashbytes use.. bytes, and as you can see converting text to varbinary, they are different
try this to change the collation and comparision rules
select distinct
columnA + [Column_in_question] collate LATIN1_GENERAL_BIN
from [dbo].[Tabled_in_question]