I'm trying to convert a hash into an integer. The conversion works as expected in python and t-sql (with the same answer). I'm trying to replicate it in duckdb but can't quite.
In python
def get_hash(customer_id):
hash_object = hashlib.md5(customer_id.encode())
return hash_object.hexdigest()[:15]
def get_integer_representation_of_hash(customer_id):
hash_value = get_hash(customer_id)
return int(hash_value, 16)
In tsql
SELECT
SUBSTRING(CONVERT(VARCHAR(900),
HASHBYTES('MD5', CAST('Customer1' AS VARCHAR(36))),
1
),
3, 15
),
CONVERT(bigint, CONVERT(VARBINARY(900), '0' + SUBSTRING(CONVERT(VARCHAR(900),
HASHBYTES('MD5', CAST('Customer1' AS VARCHAR(36))),
1
),
3, 15
), 2))
In duckdb
SELECT
md5('Customer1')[:15] AS hash_value,
HASH(md5('Customer1')[:15]) AS hash_to_int
In all three cases, the hash value is the same.
Hash = BECFB907888C8D4
In python and tsql I get the same integer value. In duckdb I get something completely different.
Python = TSQL = 859338226837014740
DuckDB = 16188616960793580010
I think it's because the int type is incorrect in duckdb. It needs to be 16 instead of 64 but I don't quite know how to get to that.
You can prepend 0x
to the hex value and then cast to uint64
:
D select md5('Customer1')[:15] as hex, ('0x' || hex)::uint64 as uint64;
┌─────────────────┬────────────────────┐
│ hex │ uint64 │
│ varchar │ uint64 │
├─────────────────┼────────────────────┤
│ becfb907888c8d4 │ 859338226837014740 │
└─────────────────┴────────────────────┘