I've got a Regular Expression meant to validate that a phone number string is either empty, or contains 10-14 digits in any format. It works for requiring a minimum of 10 but continues to match beyond 14 digits. I've rarely used lookaheads before and am not seeing the problem. Here it is with the intended interpretation in comments:
/// ^ - Beginning of string
/// (?= - Look ahead from current position
/// (?:\D*\d){10,14} - Match 0 or more non-digits followed by a digit, 10-14 times
/// \D*$ - Ending with 0 or more non-digits
/// .* - Allow any string
/// $ - End of string
^(?=(?:\D*\d){10,14}\D*|\s*$).*$
This is being used in an asp.net MVC 5 site with the System.ComponentModel.DataAnnotations.RegularExpressionAttribute
so it is in use server side with .NET Regexes and client-side in javascript with jquery validate. How can I get it to stop matching if the string contains more than 14 digits?
The problem with the regular expression
^(?=(?:\D*\d){10,14}\D*|\s*$).*$
is that there is no end-of-line anchor between \D
and |
. Consider, for example, the string
12345678901234567890
which contains 20 digits. The lookahead will be satisfied because (?:\D*\d){10,14}
will match
12345678901234
and then \D*
will match zero non-digits. By contrast, the regex
^(?=(?:\D*\d){10,14}\D*$|\s*$).*$
will fail (as it should).
There is, however, no need for a lookahead. One can simplify the earlier expression to
^(?:(?:\D*\d){10,14}\D*)?$
Making the outer non-capture group optional allows the regex to match empty strings, as required.
There may be a problem with this last regex, as illustrate at the link. Consider the string
\nabc12\nab12c3456d789efg
The first match of (?:\D*\d)
will be \nabc1
(as \D
matches newlines) and the second match will be 2
, the third, \nab1
, and so on, for a total of 11
matches, satisfying the requirement that there be 10-14 digits. This undoubtedly is not intended. The solution is change the regex to
^(?:(?:[^\d\n]*\d){10,14}[^\d\n]*)?$
[^\d\n]
matches any character other than a digit and a newline.