I've been reading through RFC, summaries, wikipedia, etc. I'm super confused about local-part vs labels. It seems to me that the local-part is before the @. That much seems straightforward. The label is any part of the domain separated by a dot. But it seems to me that some places also refer to the local-part as a label. And that is very confusing within the context of where hyphens are allowed. So specifically what is a label?
And with that, which of these are valid email address (if any)?
-bobross@painting.com
bobross-@painting.com
bobross@-painting.com
bobross@painting-.com
My understanding is that a label can neither end or begin with a hyphen and it must not contain two consecutive hyphens. Am I missing anything with that?
Bonus points - there are a number of special characters allowed in the local-part, but some sources I've seen say that the local-part must end in an alphanumeric character, but I'm not actually seeing that in any standard..am I missing it or can it end with one of the allowed characters?
From rfc5321, section 2.3.5:
A domain name (or often just a "domain") consists of one or more
components, separated by dots if more than one appears. In the case
of a top-level domain used by itself in an email address, a single
string is used without any dots. This makes the requirement,
described in more detail below, that only fully-qualified domain
names appear in SMTP transactions on the public Internet,
particularly important where top-level domains are involved. These
components ("labels" in DNS terminology, RFC 1035 [2]) are restricted
for SMTP purposes to consist of a sequence of letters, digits, and
hyphens drawn from the ASCII character set [6]. Domain names are
used as names of hosts and of other entities in the domain name
hierarchy. For example, a domain may refer to an alias (label of a
CNAME RR) or the label of Mail eXchanger records to be used to
deliver mail instead of representing a host name. See RFC 1035 [2]
and Section 5 of this specification.
In other words, a domain of abc.def.xyz
is made up of 3 components (aka labels): abc
, def
, and xyz
. Each of these labels is only allowed to contain letters, digits, and hyphens. For a more specific definition, we must check the ABNF grammar in the Command Argument Syntax section because what we really care about is the syntax of the arguments to the MAIL FROM
and RCPT TO
commands (aka the "email address" tokens in layman's terms).
An "email address" is actually referred to as a Mailbox
in the SMTP specification:
Mailbox = Local-part "@" ( Domain / address-literal )
Now to look at the definition of a Local-part
token:
Local-part = Dot-string / Quoted-string
; MAY be case-sensitive
Dot-string = Atom *("." Atom)
Atom = 1*atext
To get the definition of atext
, we need to look at the Internet Message Format. Specifically, we need to look at section 3.2.3:
atext = ALPHA / DIGIT / ; Printable US-ASCII
"!" / "#" / ; characters not including
"$" / "%" / ; specials. Used for atoms.
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
(Note: I've left out the definition of a Quoted-string
because it's irrelevant to your question.)
Now let's check out the definition of a Domain
:
Domain = sub-domain *("." sub-domain)
sub-domain = Let-dig [Ldh-str]
Let-dig = ALPHA / DIGIT
Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig
What we see here is that a label is the same thing as a sub-domain
token, but here we also see that labels (aka sub-domain
's) of a Domain
cannot start or end with a hyphen.
We can also see that a sub-domain
token is actually a subset of the allowable characters in an Atom
and so the characters allowed in each component of a Local-part
are not the same as those allowed in each component of a Domain
.
To answer your other questions:
And with that, which of these are valid email address (if any)?
-bobross@painting.com [VALID]
bobross-@painting.com [VALID]
bobross@-painting.com [INVALID]
bobross@painting-.com [INVALID]
My understanding is that a label can neither end or begin with a hyphen and it must not contain two consecutive hyphens. Am I missing anything with that?
Let's use the term sub-domain
to make things less confusing.
Yes, that is correct.
Bonus points - there are a number of special characters allowed in the local-part, but some sources I've seen say that the local-part must end in an alphanumeric character, but I'm not actually seeing that in any standard..am I missing it or can it end with one of the allowed characters?
Well, according to the ABNF grammar of rfc5322, a Local-part
can both start and end with a hyphen.