python-3.xsql-serverpandasfunctiontype-conversion

Convert bytes in a pandas dataframe column into hexadecimals


There is a problem when pandas reads read_sql bytes column. If you look at the sql request through DBeaver, then the bytes column shows differently, but if you look at it through read_sql, it seems that pandas translates the value into a hex.

For example, pandas shows column value -

b'\x80\xbc\x10`K\xa8\x95\xd8\x11\xe5K\xf9\xe7\xd7\x8cq'

I need -

0x80BC10604BA895D811E54BF9E7D78C71

If I use, in sql,

CONVERT(varchar(max),guid,1)

pandas give correct values. But I need to convert column in python not in sql.


Solution

  • bytes -> string

    It looks like the column (which I call 'col' below) contains bytes. There's the .hex() method that you can map to each item in the column to convert them into hexadecimal strings.

    df['col'] = df['col'].map(lambda e: e.hex())
    

    This produces

    80bc10604ba895d811e54bf9e7d78c71
    

    It seems the specific output you want is to have the "0x" prefix and have upper-case letters for the digits above 9, which you can do by calling .upper() on the hex string and prepend '0x'.

    df['col'] = df['col'].map(lambda e: '0x' + e.hex().upper())
    

    This produces

    0x80BC10604BA895D811E54BF9E7D78C71
    

    bytes -> int -> string

    You can also convert to int first and format that into a string afterwards:

    df['col'] = df['col'].map(int.from_bytes).map('0x{:X}'.format)
    

    If the column has unexpected values (such as None or a string), you can also modify the mapped function to include a condition that handles such cases. For example, if there could be None, you can safeguard against that with the following:

    df['col'] = df['col'].map(lambda e: '0x' + e.hex().upper() if e else e)
    

    If there could be a string, then:

    df['col'] = df['col'].map(lambda x: x.encode().hex() if isinstance(x, str) else x.hex())