smtprfcrfc2822

Validation/Format of display-name in from header


I need to know what are the rules for validation/format from(name-addr) field in the email. In rfc explained the format of name-addr, but goes into detail about the display-name.

Like this:

From: John Q. Public <JQP@bar.com>

I want to know the characters and length allowed. How do I know that John Q. Public has valid characters? Should I allow only printable US-ASCII characters ?

I consulted the RFC 2822 and not found on the specific format of a display name


Solution

  • This is all defined in the rfc you linked to in your question (btw, the newer version of this document is RFC 5322):

    display-name    =       phrase
    phrase          =       1*word / obs-phrase
    word            =       atom / quoted-string
    atom            =       [CFWS] 1*atext [CFWS]
    atext           =       ALPHA / DIGIT / ; Any character except controls,
                            "!" / "#" /     ;  SP, and specials.
                            "$" / "%" /     ;  Used for atoms
                            "&" / "'" /
                            "*" / "+" /
                            "-" / "/" /
                            "=" / "?" /
                            "^" / "_" /
                            "`" / "{" /
                            "|" / "}" /
                            "~"
    specials        =       "(" / ")" /     ; Special characters used in
                            "<" / ">" /     ;  other parts of the syntax
                            "[" / "]" /
                            ":" / ";" /
                            "@" / "\" /
                            "," / "." /
                            DQUOTE
    

    You have to jump around in the document a bit to find the definitions of each of these token types, but they are all there.

    Once you have the definitions, all you need to do is scan over your name string and see if it consists only of the valid characters.

    According to the definitions, a display-name is a phrase and a phrase is 1-or-more word tokens (or an obs-word which I'll ignore for now to make this explanation simpler).

    A word token can be either an atom or a quoted-string.

    In your example, John Q. Public contains a special character, ".", which cannot appear within an atom token. What about a quoted-string token? Well, let's see...

    quoted-string   =       [CFWS]
                            DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                            [CFWS]
    qcontent        =       qtext / quoted-pair
    qtext           =       NO-WS-CTL /     ; Non white space controls
                            %d33 /          ; The rest of the US-ASCII
                            %d35-91 /       ;  characters not including "\"
                            %d93-126        ;  or the quote character
    

    Based on this, we can tell that a "." is allowed within a quoted-string, so... the correct formatting for your display-name can be any of the following:

    From: "John Q. Public" <JQB@bar.com>
    

    or

    From: John "Q." Public <JQB@bar.com>
    

    or

    From: "John Q." Public <JQB@bar.com>
    

    or

    From: John "Q. Public" <JQB@bar.com>
    

    Any one of those will work.