How would I convert characters which are surrogate pairs into hexadecimal?
I've found that using hex()
and ord()
works for characters with a single code point, such as emojis like "š". For example:
print(hex(ord("š")))
# '0x1f600'
Similarly, using chr()
and int()
works for getting the characters from the hexadecimal:
print(chr(int(0x1f600)))
# 'š'
However, as soon as I use a surrogate pair, such as an emoji like "š©š»", the code throws an error:
print(hex(ord("š©š»")))
TypeError: ord() expected a character, but string of length 2 found
How would I fix this, and how would I convert such hexadecimal back into a character?
Since an exact output format wasn't specified, how about:
def hexify(s):
return s.encode('utf-32-be').hex(sep=' ', bytes_per_sep=4)
def unhexify(s):
return bytes.fromhex(s).decode('utf-32-be')
s = hexify('š©š»')
print(s)
print(unhexify(s))
Output:
0001f469 0001f3fb
š©š»
Or similar to your original code:
def hexify(s):
return [hex(ord(c)) for c in s]
def unhexify(L):
return ''.join([chr(int(n,16)) for n in L])
s = hexify('š©š»')
print(s)
print(unhexify(s))
Output:
['0x1f469', '0x1f3fb']
š©š»