I have a regular expression:
(<select([^>]*>))(.*?)(</select\s*>)
Since it uses lazy repeat quantifier, for longer strings(having options more than 500) it backtracks for more than 100,000 times and fails. Please help me to find a better regular expression which doesn't use lazy repeat quantifier
<select[^>]*>[^<]*(?:<(?!/select>)[^<]*)*</select>
...or in human-readable form:
<select[^>]*> # start tag
[^<]* # anything except opening bracket
(?: # if you find an open bracket
<(?!/select>) # match it if it's not part of end tag
[^<]* # consume any more non-brackets
)* # repeat as needed
</select> # end tag
This is an example of the "unrolled loop" technique Friedl develops in his book, Mastering Regular Expressions. I did a quick test in RegexBuddy using a pattern based on reluctant quantifiers:
(?s)<select[^>]*>.*?</select>
...and it took about 6,000 steps to find a match. The unrolled-loop pattern took only 500 steps. And when I removed the closing bracket from the end tag (</select
), making a match impossible, it required only 800 steps to report failure.
If your regex flavor supports possessive quantifiers, go ahead and use them, too:
<select[^>]*+>[^<]*+(?:<(?!/select>)[^<]*+)*+</select>
It takes about the same number of steps to achieve a match, but it can use a lot less memory in the process. And if no match is possible, it fails even more quickly; in my tests it took about 500 steps, the same number it took to find a match.