Can you explain me how this works? Here is an example:
<!-- The quick brown fox
jumps over the lazy dog -->
<!--[if IE 7]>
<link rel="stylesheet" type="text/css" href="/supersheet.css" />
<![endif]-->
<!-- Pack my box with five dozen liquor jugs -->
First, I tried to use the following regular expression to match the content inside conditional comments:
/<!--.*?stylesheet.*?-->/s
It failed, as the regular expression matches all the content before the first <!--
and the last -->
. Then I tried using another pattern with a lookahead assertion:
/<!--(?=.*?stylesheet).*?-->/s
It works and matches exactly what I need. However, the following regular expression works as well:
/<!--(?=.*stylesheet).*?-->/s
The last regular expression does not have a reluctant quantifier in the lookahead assertion. And now I am confused. Can anyone explain me how it works? Maybe there is a better solution for this example?
Updated:
I tried usig the regular expressions with lookahead assertion in another document, and it failed to mach the content between the comments. So, this one /<!--(?=.*?stylesheet).*?-->/s
(as well as this one /<!--(?=.*stylesheet).*?-->/s
) is not correct. Do not use it and try other suggestions.
Updated:
The solution has been found by Jonny 5 (see the answer). He suggested three options:
/style-sheet.css
, it will not work.\K
. It works like a charm. The downsides are the following:
I think the following is a good solution for my example:
/(?s)<!--(?:(?!<!).)+?stylesheet.+?-->/
The same but with the s
modifier at the end:
/<!--(?:(?!<!).)+?stylesheet.+?-->/s
As I said, this is a good solution, but I managed to improve the pattern and found another one that in my case works faster.
So, the final solution is the following:
/<!--(?:(?!-->).)+?stylesheet.+?-->/s
Thanks all the participants for interesting answers.
To match only the part <!--
...stylesheet
...-->
there are many ways:
1.) Use a negated hyphen [^-]
to limit the match and stay in between <!--
and stylesheet
(?s)<!--[^-]+stylesheet.+?-->
[^-]
allows only characters, that are not a hyphen. See test at regex101.
2.) To get the "last" or closest match without much regex effort, also can put a greedy dot before to ᗧ eat up. Makes sense if not matching globally / only one item to match. Use \K to reset after the greed:
(?s)^.*\K<!--.+?stylesheet.+?-->
See test at regex101. Also can use a capture group and grab $1: (?s)^.*(<!--.+?stylesheet.+?-->)
3.) Using a lookahead to narrow it down is usually more costly:
(?s)<!--(?:(?!<!).)+?stylesheet.+?-->
See test at regex101. (?!<!).
looks ahead at each character in between <!--
and stylesheet
if not starting another <!
... to stay inside one element. Similar to the negated hyphen solution.
Instead of .*
I used .+
for one or more - depends on what to be matched. Here +
fits better.
What solution to use depends on the exact requirements. For this case I would use the first.