pythonpunycode

IDNA Encode Adding Apostrophes and letter B?


I am using the IDNA library to encode/decode unicide domain names but when I encode a domain name, it adds apostrophes either side of the string and prepends the letter b?

For example:

import idna
print(idna.encode('español.com'))

Output: b'xn--espaol-zwa.com'

Expected output: xn--espaol-zwa.com

I feel like I'm missing something really obvious but not sure how to get to the bottom of this.

My expected output is reinforced by the fact if I decode it:

print(idna.decode('xn--espaol-zwa.com'))

I get the original domain: español.com


Solution

  • For any newbies like me looking for a simple solution to this, as @Barmer has pointed out, the IDNA package outputs a byte string even if you feed in a character string.

    If you want a string, you can chain UTF-8 decoding thus:

    idna.encode('español.com').decode('utf-8')
    

    Outputs a character string of : xn--espaol-zwa.com

    idna.decode will correctly decode this back to español.com without any further treatment needed.