I am using the IDNA library to encode/decode unicide domain names but when I encode a domain name, it adds apostrophes either side of the string and prepends the letter b?
For example:
import idna
print(idna.encode('español.com'))
Output: b'xn--espaol-zwa.com'
Expected output: xn--espaol-zwa.com
I feel like I'm missing something really obvious but not sure how to get to the bottom of this.
My expected output is reinforced by the fact if I decode it:
print(idna.decode('xn--espaol-zwa.com'))
I get the original domain: español.com
For any newbies like me looking for a simple solution to this, as @Barmer has pointed out, the IDNA package outputs a byte string even if you feed in a character string.
If you want a string, you can chain UTF-8 decoding thus:
idna.encode('español.com').decode('utf-8')
Outputs a character string of : xn--espaol-zwa.com
idna.decode will correctly decode this back to español.com without any further treatment needed.