duckdb

Base 16 hexadecimal string to base 10 integer


I'm trying to convert a hash into an integer. The conversion works as expected in python and t-sql (with the same answer). I'm trying to replicate it in duckdb but can't quite.

In python

def get_hash(customer_id):
    hash_object = hashlib.md5(customer_id.encode())
    return hash_object.hexdigest()[:15]

def get_integer_representation_of_hash(customer_id):
    hash_value = get_hash(customer_id)
    return int(hash_value, 16)

In tsql

SELECT 
SUBSTRING(CONVERT(VARCHAR(900),
                        HASHBYTES('MD5', CAST('Customer1' AS VARCHAR(36))), 
                        1
                    ),
                    3, 15
                ),
CONVERT(bigint, CONVERT(VARBINARY(900), '0'  + SUBSTRING(CONVERT(VARCHAR(900),
                        HASHBYTES('MD5', CAST('Customer1' AS VARCHAR(36))), 
                        1
                    ),
                    3, 15
                ), 2))

In duckdb

SELECT 
md5('Customer1')[:15] AS hash_value,
HASH(md5('Customer1')[:15]) AS hash_to_int

In all three cases, the hash value is the same.

Hash = BECFB907888C8D4

In python and tsql I get the same integer value. In duckdb I get something completely different.

Python = TSQL = 859338226837014740

DuckDB =        16188616960793580010

I think it's because the int type is incorrect in duckdb. It needs to be 16 instead of 64 but I don't quite know how to get to that.


Solution

  • You can prepend 0x to the hex value and then cast to int64:

    D select md5('Customer1')[:15] as hex, ('0x' || hex)::int64 as int64;
    ┌─────────────────┬────────────────────┐
    │       hex       │       int64        │
    │     varchar     │       int64        │
    ├─────────────────┼────────────────────┤
    │ becfb907888c8d4 │ 859338226837014740 │
    └─────────────────┴────────────────────┘