emailrfc5322

Email label vs local-part and hyphens


I've been reading through RFC, summaries, wikipedia, etc. I'm super confused about local-part vs labels. It seems to me that the local-part is before the @. That much seems straightforward. The label is any part of the domain separated by a dot. But it seems to me that some places also refer to the local-part as a label. And that is very confusing within the context of where hyphens are allowed. So specifically what is a label?

And with that, which of these are valid email address (if any)?

-bobross@painting.com
bobross-@painting.com
bobross@-painting.com
bobross@painting-.com

My understanding is that a label can neither end or begin with a hyphen and it must not contain two consecutive hyphens. Am I missing anything with that?

Bonus points - there are a number of special characters allowed in the local-part, but some sources I've seen say that the local-part must end in an alphanumeric character, but I'm not actually seeing that in any standard..am I missing it or can it end with one of the allowed characters?


Solution

  • From rfc5321, section 2.3.5:

    A domain name (or often just a "domain") consists of one or more
    components, separated by dots if more than one appears.  In the case
    of a top-level domain used by itself in an email address, a single
    string is used without any dots.  This makes the requirement,
    described in more detail below, that only fully-qualified domain
    names appear in SMTP transactions on the public Internet,
    particularly important where top-level domains are involved.  These
    components ("labels" in DNS terminology, RFC 1035 [2]) are restricted
    for SMTP purposes to consist of a sequence of letters, digits, and
    hyphens drawn from the ASCII character set [6].  Domain names are
    used as names of hosts and of other entities in the domain name
    hierarchy.  For example, a domain may refer to an alias (label of a
    CNAME RR) or the label of Mail eXchanger records to be used to
    deliver mail instead of representing a host name.  See RFC 1035 [2]
    and Section 5 of this specification.
    

    In other words, a domain of abc.def.xyz is made up of 3 components (aka labels): abc, def, and xyz. Each of these labels is only allowed to contain letters, digits, and hyphens. For a more specific definition, we must check the ABNF grammar in the Command Argument Syntax section because what we really care about is the syntax of the arguments to the MAIL FROM and RCPT TO commands (aka the "email address" tokens in layman's terms).

    An "email address" is actually referred to as a Mailbox in the SMTP specification:

    Mailbox        = Local-part "@" ( Domain / address-literal )
    

    Now to look at the definition of a Local-part token:

    Local-part     = Dot-string / Quoted-string
                   ; MAY be case-sensitive
    
    
    Dot-string     = Atom *("."  Atom)
    
    Atom           = 1*atext
    

    To get the definition of atext, we need to look at the Internet Message Format. Specifically, we need to look at section 3.2.3:

    atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                        "!" / "#" /        ;  characters not including
                        "$" / "%" /        ;  specials.  Used for atoms.
                        "&" / "'" /
                        "*" / "+" /
                        "-" / "/" /
                        "=" / "?" /
                        "^" / "_" /
                        "`" / "{" /
                        "|" / "}" /
                        "~"
    

    (Note: I've left out the definition of a Quoted-string because it's irrelevant to your question.)

    Now let's check out the definition of a Domain:

    Domain         = sub-domain *("." sub-domain)
    
    sub-domain     = Let-dig [Ldh-str]
    
    Let-dig        = ALPHA / DIGIT
    
    Ldh-str        = *( ALPHA / DIGIT / "-" ) Let-dig
    

    What we see here is that a label is the same thing as a sub-domain token, but here we also see that labels (aka sub-domain's) of a Domain cannot start or end with a hyphen.

    We can also see that a sub-domain token is actually a subset of the allowable characters in an Atom and so the characters allowed in each component of a Local-part are not the same as those allowed in each component of a Domain.

    To answer your other questions:

    And with that, which of these are valid email address (if any)?

    -bobross@painting.com [VALID]
    bobross-@painting.com [VALID]
    bobross@-painting.com [INVALID]
    bobross@painting-.com [INVALID]
    

    My understanding is that a label can neither end or begin with a hyphen and it must not contain two consecutive hyphens. Am I missing anything with that?

    Let's use the term sub-domain to make things less confusing.

    Yes, that is correct.

    Bonus points - there are a number of special characters allowed in the local-part, but some sources I've seen say that the local-part must end in an alphanumeric character, but I'm not actually seeing that in any standard..am I missing it or can it end with one of the allowed characters?

    Well, according to the ABNF grammar of rfc5322, a Local-part can both start and end with a hyphen.