[SOLVED] How to convert surrogate pairs into hexadecimal, and vice-versa in Python?

How to convert surrogate pairs into hexadecimal, and vice-versa in Python?

How would I convert characters which are surrogate pairs into hexadecimal?

I've found that using hex() and ord() works for characters with a single code point, such as emojis like "😀". For example:

print(hex(ord("😀")))
# '0x1f600'

Similarly, using chr() and int() works for getting the characters from the hexadecimal:

print(chr(int(0x1f600)))
# '😀'

However, as soon as I use a surrogate pair, such as an emoji like "👩🏻", the code throws an error:

print(hex(ord("👩🏻")))
TypeError: ord() expected a character, but string of length 2 found

How would I fix this, and how would I convert such hexadecimal back into a character?

Solution

Since an exact output format wasn't specified, how about:

def hexify(s):
    return s.encode('utf-32-be').hex(sep=' ', bytes_per_sep=4)

def unhexify(s):
    return bytes.fromhex(s).decode('utf-32-be')

s = hexify('👩🏻')
print(s)
print(unhexify(s))

Output:

0001f469 0001f3fb
👩🏻

Or similar to your original code:

def hexify(s):
    return [hex(ord(c)) for c in s]

def unhexify(L):
    return ''.join([chr(int(n,16)) for n in L])

s = hexify('👩🏻')
print(s)
print(unhexify(s))

Output:

['0x1f469', '0x1f3fb']
👩🏻