I want to create a regular expression to receive:
<p class="MyClass">
<p> something 1 </p>
<p> something 2 </p>
<span> <span> // or more html tag here
something
</p>
something's here, not in any tag!
from:
<p class="MyClass">
<p> something 1 </p>
<p> something 2 </p>
<span> <span> // or more html tag here
something
</p>
something's here, not in any tag!
<p class="MyClass">
<p> another thing 1</p>
<p> another thing 2</p>
<p> another thing 3</p>
another thing
</p>
...
I think I will use a regex to match everything between <p class="MyClass">
and the next one. So the regex is /(<p class="MyClass">[\s\S]*)<p class="MyClass">/
, work correctly in this case. But it doesn't work when I want to get a notification of this page http://daotao.dut.udn.vn/sv/G_Thongbao_LopHP.aspx. The DOM is so strange ?!
Sorry for my bad English.
regex should be
(<p class="MyClass">[\s\S]*?)(?=<p class="MyClass">|$)
[\s\S]*?
: *?
is a lazy quantifier so that it matches the shortest the default is greedy (matches the largest).(?=<p class="MyClass">|$)
: lookhead so that it does not belongs to the match, and |$
to get also the last match