internationalizationdomain-nameidnpunycode

Can I treat all domain names as being IDNs without any ill effects?


From testing, it seems like trying to convert both IDNs and regular domain names 'just works' - eg, if the input doesn't need to be changed punycode will just return the input.

punycode.toASCII('lancôme.com');

returns:

'xn--lancme-lxa.com'

And

punycode.toASCII('apple.com');

returns:

'apple.com'

This looks great, but is it specified anywhere? Can I safely convert everything to punycode?


Solution

  • That is correct. If you look at how the procedure for converting unicode strings to ascii punycode, the process only alters any non-ascii character. Since regular domains cannot contain non-ascii characters, if your conversor is correctly implemented, it will never transform any pure-ascii string.

    You can read more about how unicode is converted to punycode here: https://en.wikipedia.org/wiki/Punycode

    Punycode is specified in RFC 3492: https://www.ietf.org/rfc/rfc3492.txt, and it clearly says:

    "Basic code point segregation" is a very simple and efficient encoding for basic code points occurring in the extended string: they are simply copied all at once.

    Therefore, if your extended string is made of basic code points, it will just be copied without change.