phpregexemail

verify emails in PHP


I tried PHP regex to verify emails, such as /^[_a-z0-9-]+(\.[_a-z0-9-])*@[a-z0-9-]+(\.[a-z0-9-])*(\.[a-z]{2,4})$/. I know that it is not a correct way to validate the emails, for abc@abc.abc is also correct in my regex.

Do I need to enumerate all the domain name suffixes?

I happened to know that in PHP filter_var function can verify emails, however filter_var('abc@abc.abc', FILTER_VALIDATE_EMAIL) is also correct.

What is the theory of FILTER_VALIDATE_EMAIL in PHP source code? or can someone tell me a better way to verify emails?

Thanks very much!


Solution

  • Function php_filter_validate_email from logical_filters.c is used for this check.

    It tests email against following regex /^(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){255,})(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){65,}@)(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22))(?:\\.(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-+[a-z0-9]+)*\\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-+[a-z0-9]+)*)|(?:\\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\\]))$/iD and also for the maximum length of 320 characters.

    Also comment from the source:

    /*
    * The regex below is based on a regex by Michael Rushton.
    * However, it is not identical. I changed it to only consider routeable
    * addresses as valid. Michael's regex considers a@b a valid address
    * which conflicts with section 2.3.5 of RFC 5321 which states that:
    *
    * Only resolvable, fully-qualified domain names (FQDNs) are permitted
    * when domain names are used in SMTP. In other words, names that can
    * be resolved to MX RRs or address (i.e., A or AAAA) RRs (as discussed
    * in Section 5) are permitted, as are CNAME RRs whose targets can be
    * resolved, in turn, to MX or address RRs. Local nicknames or
    * unqualified names MUST NOT be used.
    *
    * This regex does not handle comments and folding whitespace. While
    * this is technically valid in an email address, these parts aren't
    * actually part of the address itself.
    *
    * Michael's regex carries this copyright:
    *
    * Copyright © Michael Rushton 2009-10
    * http://squiloople.com/
    * Feel free to use and redistribute this code. But please keep this copyright notice.
    *
    */
    

    This is good enough for most of the real world emails. For more details check out this question: Using a regular expression to validate an email address