emailsmtpinternationalization

What is the current (2023) state of using internationalized email addresses?


Digging through SO and Google, I'm trying to figure out what the current state is regarding the use of internationalized email addresses. Best SO answers were from 2021. Based on my research here's what I understood:

Domain Part: Unicode characters can already be used for the domain part of an email address. These characters are automatically encoded into "internationalized domain names" (IDNAs) using a well-defined algorithm.

Mailbox Part: Unicode characters can be used for the mailbox part of an address only when using the SMTPUTF8 protocol. SMTPUTF8 protocol provides full Unicode support for email addresses. However, the problem is that, as of 2021, the adoption of the SMTPUTF8 protocol was not yet widespread enough. If any server in the message delivery chain does not support SMTPUTF8, the sender has to fallback to regular SMTP and convert the email mailbox name to its ASCII alias using some heuristics that the receiving end is using. These heuristic rules are not standardized, and the sender would need to know exactly how the receiving end is doing the aliasing. Therefore, this solution does not generalize well.

So, if I understood correctly, this means that we can already use Unicode characters for domain names, but for the mailbox part, everything becomes a question of "How good is the general support for SMTPUTF8?"

Is this reasoning correct, and how good is the support for SMTPUTF8 currently (in 2023)?


Solution

  • Domain Part: Unicode characters can already be used for the domain part of an email address. These characters are automatically encoded into "internationalized domain names" (IDNAs) using a well-defined algorithm.

    Yes, I believe your reasoning is all basically correct. From a technical perspective, internationalized domains are fully supported with SMTP via the ASCII-compatible encoding of the domain via the IDNA spec rules, as you noted. There should be no issue sending e-mails to IDNs.

    Note that "internationalized domain names" are IDNs not IDNAs. The "A" in "IDNA" stands for Internationalized Domain Names in Applications per the spec referenced above, and governs how applications should deal with IDNs.

    If any server in the message delivery chain does not support SMTPUTF8, the sender has to fallback to regular SMTP and convert the email mailbox name to its ASCII alias using some heuristics that the receiving end is using. These heuristic rules are not standardized, and the sender would need to know exactly how the receiving end is doing the aliasing. Therefore, this solution does not generalize well.

    Because of spam abuse, it is now extremely rare to have public e-mail forwarders in a transmission chain. The "chain" consists of one or more internal servers on the sender side, and one or more internal servers on the recipient side, but no public servers in between.

    I would further venture that if someone gives you an e-mail address with an internationalized local part, then it kind of goes without saying that the MX servers for their domain must advertise SMTPUTF8, and will accept the internationalized local part. If that were not true, then the person who gave you the e-mail address would never be able to receive e-mail from anyone, and therefore your correspondent would not be giving out that address!

    So basically, senders just need to make sure their systems support creating an SMTPUTF8 session when dealing with an internationalized local part, and the recipient side should be good. This should be generally supported by default as of this writing in 2024. For example Postfix 3 released in Feb 2015 added SMTPUTF8 support, it defaults to yes, and Postfix defaults to using SMTPUTF8 if the recipient server advertises it and there is UTF-8 content in the message envelope and headers.

    However, all that being said some other considerations are mailing lists and various web-forms that may not accept internationalized local parts because their creators have done silly things like trying to validate e-mail addresses with regexes.

    Therefore, even in 2024 I would hesitate before issuing internationalized local parts to my users, and if I did, I would always provide an ASCII alias for the user to use with systems that did not support the primary internationalized form.