phppreg-match

What do the symbols mean in preg_match?


I have this expression in a code snippet i borrowed offline. It forces the new users to have a password that not only requires upper+lower+numbers but they must be in that order! If i enter lower+upper+numbers, it fails!

if (preg_match("/^.*(?=.{4,})(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).*$/", $pw_clean, $matches)) {

Ive searched online but can't find a resource that tells me what some characters mean. I can see that the pattern is preg_match("/some expression/",yourstring,your match).

What do these mean:

1.  ^          -  ???
2.  .*         -  ???
3.  (?=.{4,})  -  requires 4 characters minimum
4.  (?.*[0-9]) -  requires it to have numbers
5.  (?=.*[a-z])-  requires it to have lowercase
6.  (?=.*[A-Z])-  requires it to have uppercase
7.  .*$        -  ???

Solution

  • Here are the direct answers. I kept them short because they won't make sense without an understanding of regex. That understanding is best gained at regular-expressions.info. I advise you to also try out the regex helper tools listed there, they allow you to experiment - see live capturing/matching as you edit the pattern, very helpful.


    1: The caret ^ is an anchor, it means "the start of the haystack/string/line".

    2: The dot . and the asterisk * serve two separate purposes:

    When these two are combined as .* it basically reads "zero or more of anything until a newline or another rule comes into effect".

    7: The dollar $ is also an anchor like the caret, with the opposite function: "the end of the haystack".


    Edit:

    Simple parentheses ( ) around something makes it a group. Here you have (?=) which is an assertion, specifically a positive look ahead assertion. All it does is check whether what's inside actually exists forward from the current cursor position in the haystack. Still with me?
    Example: foo(?=bar) matches foo only if followed by bar. bar is never matched, only foo is returned.

    With this in mind, let's dissect your regex:

    /^.*(?=.{4,})(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).*$/
    
    Reads as:
            ^.* From Start, capture 0-many of any character
      (?=.{4,}) if there are at least 4 of anything following this
    (?=.*[0-9]) if there is: 0-many of any, ending with an integer following
    (?=.*[a-z]) if there is: 0-many of any, ending with a lowercase letter following
    (?=.*[A-Z]) if there is: 0-many of any, ending with an uppercase letter following
            .*$ 0-many of anything preceding the End
    

    You say the order of password characters matter - it doesn't in my tests. See test script below. Hope this cleared up a thing or two. If you are looking for another regex which is a bit more forgiving, see regex password validation

    <pre>
    <?php
    // Only the last 3 fail, as they should. You claim the first does not work?
    $subjects = array("aaB1", "Baa1", "1Baa", "1aaB", "aa1B", "aa11", "aaBB", "aB1");
    
    foreach($subjects as $s)
    {
        $res = preg_match("/^.*(?=.{4,})(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).*$/", $s, $matches);
        echo "result: ";
        print_r($res);
    
        echo "<br>";
        print_r($matches);
        echo "<hr>";
    }
    

    Excellent online tools for Regular Expressions:
    - https://regexr.com/
    - https://regex101.com/
    - http://jex.im/regulex