formsemailemail-validationemail-address

What characters are allowed in an email address?


I'm not asking about full email validation.

I just want to know what are allowed characters in user-name and server parts of email address. This may be oversimplified, maybe email adresses can take other forms, but I don't care. I'm asking about only this simple form: user-name@server (e.g. wild.wezyr@best-server-ever.com) and allowed characters in both parts.


Solution

  • See RFC 5322: Internet Message Format and, to a lesser extent, RFC 5321: Simple Mail Transfer Protocol.

    RFC 822 also covers email addresses, but it deals mostly with its structure:

     addr-spec   =  local-part "@" domain        ; global address     
     local-part  =  word *("." word)             ; uninterpreted
                                                 ; case-preserved
     
     domain      =  sub-domain *("." sub-domain)     
     sub-domain  =  domain-ref / domain-literal     
     domain-ref  =  atom                         ; symbolic reference
    

    where an atom and word are defined as

                                                 ; (  Octal, Decimal.)
     CHAR        =  <any ASCII character>        ; (  0-177,  0.-127.)
     CTL         =  <any ASCII control           ; (  0- 37,  0.- 31.)
                     character and DEL>          ; (    177,     127.)
     specials    =  "(" / ")" / "<" / ">" / "@"  ; Must be in quoted-
                 /  "," / ";" / ":" / "\" / <">  ;  string, to use
                 /  "." / "[" / "]"              ;  within a word.
     atom        =  1*<any CHAR except specials, SPACE and CTLs>
     word        =  atom / quoted-string
    

    And as usual, Wikipedia has a decent article on email addresses:

    The local-part of the email address may use any of these ASCII characters:

    • uppercase and lowercase Latin letters A to Z and a to z;
    • digits 0 to 9;
    • special characters !#$%&'*+-/=?^_`{|}~;
    • dot ., provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g. John..Doe@example.com is not allowed but "John..Doe"@example.com is allowed);
    • space and "(),:;<>@[\] characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);
    • comments are allowed with parentheses at either end of the local-part; e.g. john.smith(comment)@example.com and (comment)john.smith@example.com are both equivalent to john.smith@example.com.

    In addition to ASCII characters, as of 2012 you can use international characters above U+007F, encoded as UTF-8 as described in the RFC 6532 spec and explained on Wikipedia. Note that as of 2019, these standards are still marked as Proposed, but are being rolled out slowly. The changes in this spec essentially added international characters as valid alphanumeric characters (atext) without affecting the rules on allowed & restricted special characters like !# and @:.

    For validation, see Using a regular expression to validate an email address.

    The domain part is defined as follows:

    The Internet standards (Request for Comments) for protocols mandate that component hostname labels may contain only the ASCII letters a through z (in a case-insensitive manner), the digits 0 through 9, and the hyphen (-). The original specification of hostnames in RFC 952, mandated that labels could not start with a digit or with a hyphen, and must not end with a hyphen. However, a subsequent specification (RFC 1123) permitted hostname labels to start with digits. No other symbols, punctuation characters, or blank spaces are permitted.