I am writing a regex to match email addresses. So, I am first reading the spec on what an email address looks like.
According to addr-spec
in RFC 5322, me@domain.com
is a valid email (notice the spaces at the start and end).
Proof:
For spaces on the left side, a resolution path is
addr-spec
→local-part
→dot-atom
→CFWS
→FWS
→WSP
For spaces on the right side, a resolution path is
addr-spec
→domain
→dot-atom
→CFWS
→FWS
→WSP
Why are spaces allowed? Is this intended to be a legacy consideration?
As noted in Section 3.2.3 of the linked RFC,
Semantically, the optional comments and FWS surrounding the rest of the characters are not part of the atom; the atom is only the run of atext characters in an atom, or the atext and "." characters in a dot-atom.
So the string me@domain.com
is supposed to represent the same sender/recipient as me@domain.com
.
The reason for this is in the name of FWS: it is folding white space. The lines of text making up an email message, by this RFC, should be no longer than 78 characters each and must not be longer than 998 characters each. If a logical line of an email is longer than these limits, it should/must be folded by inserting CRLF-WSP sequences at positions where FWS is allowed. That is, if you logically want to write
From: my.super.long.username.that.results.in.a.line.longer.than.78.chars@domain.net
You should fold the line. One way to do this is to use the FWS you're afforded between the local part and @
:
From: my.super.long.username.that.results.in.a.line.longer.than.78.chars
@domain.net
This RFC states that, when processing an email message, one of the first steps is to recover the logical lines by deleting those CRLF sequences that are followed by WSP. Note that the WSP is not deleted! (This would e.g. mangle a plaintext email.) Thus the folded line in this example does not exactly unfold to the original line. It is actually supposed to be interpreted as
From: my.super.long.username.that.results.in.a.line.longer.than.78.chars @domain.net
(with a space before the @
). The syntax for addr-spec then has to contain provisions for allowing and ignoring whitespace where lines may be folded.
Note that the RFC actually states that you shouldn't fold an address at the @
like this. The alternative is to use the quoted form for strings, which also lets you deal with local parts that are themselves too long to fit even on a folded line. That is, if you want to write
From: my.even.longer.username.that.results.in.even.folded.lines.being.longer.than.78.characters@domain.net
You should write something like
From:
"my.even.longer.username.that.results.in.even.folded.lines.being.longer.than
.78.characters
"@domain.net
Which will unfold to
From: "my.even.longer.username.that.results.in.even.folded.lines.being.longer.than .78.characters "@domain.net
And will be interpreted the same as the original because FWS inside a quoted string is ignored and a quoted string is semantically equivalent to an atom or dot-atom.
Note that technically, the "courtesy" whitespaces in a line like the following
From: alice@domain.net, bob@domain.net
are also part of the local parts: alice
and bob
. But, if spaces outside the addresses were the only consideration, they could be handled in the syntax without putting them in the local part.