uriip-addressemail-addressiri

Distinguish between email address and IRI


I have a string that can contain either an email address or an IRI (internationalized URI). The strings do not contain additional surrounding whitespace or any HTTP linefolding characters. Moreover they do not contain any elements marked as "obsolete" in their corresponding specifications. I need a simple way to distinguish which of these things the string contains.

I'm looking at what I believe to be the latest respective specifications: RFC 5322 § 3.4.1. Addr-Spec Specification for emails, and RFC 3987 § 2.2. ABNF for IRI References and IRIs for IRIs. I've come up with the following algorithm, with explanations in parentheses:

  1. If the string begins with a quote " character, it is an email address. (Email address local-part may be a quoted string, but an IRI scheme may not.)
  2. Otherwise find the first at @ sign or colon : character.
    • If the character encountered is an at @ sign, the string contains an email address.
    • Otherwise, if it is a colon : character, the string contains an IRI.

Is that approach correct? Is there another simpler approach? Lastly for bonus, how would I expand this algorithm to also distinguish those two things from an IP address (including both IPv4 and IPv6)?


Solution

  • I would think the rules as specified are correct and fast to determine the type (email or IRI). To extend this to IP addresses their corresponding grammar should be added: https://datatracker.ietf.org/doc/html/draft-main-ipaddr-text-rep-00.

    So then your rules could be extended to:

    Rules: (I assumed well formed input)

    ipchar := hex / ':'
    hex    := [0-9A-Fa-f]