htmlregextwigopencart-3ocmod

Regex match multiple lines before and after word delimited by start and end words


I am wanting to search for {{ upc }} and start the capture not from the <div immediately ahead of the match but the 2nd <div ahead of the match i.e. <div class="form-group"> and capture not up to the first </div> after the match but the 2nd i.e closing </div> or up to the start of the next <div class="form-group"> (depending on how you look at it)

Here is the sample HTML/Twig template I am wanting to search and replace.

<div class="form-group">
    <label class="col-sm-2 control-label" for="input-sku"><span data-toggle="tooltip" title="{{ help_sku }}">{{ entry_sku }}</span></label>
    <div class="col-sm-10">
        <input type="text" name="sku" value="{{ sku }}" placeholder="{{ entry_sku }}" id="input-sku" class="form-control"/>
    </div>
</div>
<div class="form-group">
     <label class="col-sm-2 control-label" for="input-upc"><span data-toggle="tooltip" title="{{ help_upc }}">{{ entry_upc }}</span></label>
     <div class="col-sm-10">
         <input type="text" name="upc" value="{{ upc }}" placeholder="{{ entry_upc }}" id="input-upc" class="form-control"/>
     </div>
</div>
<div class="form-group">
     <label class="col-sm-2 control-label" for="input-ean"><span data-toggle="tooltip" title="{{ help_ean }}">{{ entry_ean }}</span></label>
     <div class="col-sm-10">
         <input type="text" name="ean" value="{{ ean }}" placeholder="{{ entry_ean }}" id="input-ean" class="form-control"/>
     </div>
</div>

The expected regex match is as follows:

<div class="form-group">
     <label class="col-sm-2 control-label" for="input-upc"><span data-toggle="tooltip" title="{{ help_upc }}">{{ entry_upc }}</span></label>
     <div class="col-sm-10">
         <input type="text" name="upc" value="{{ upc }}" placeholder="{{ entry_upc }}" id="input-upc" class="form-control"/>
     </div>
</div>

All help appreciated. Thank you.


Solution

  • One thing you can try is to use a negative lookahead to filter out the things you do not wish to be included in your match. For instance, matching a <div, followed by anything and then another <div, can match things like <div></div><div>.

    Instead, what you can say is to match <div, followed by anything - as long as it is not </div> - and then another <div.

    <div    (?:(?!</div>).)*    <div
    

    Then, you can insert that same subpattern anywhere in your expression where you'd normally write .*. In this particular case, you can repeat that to make sure you're not hitting a closing div before the UPC and then continue with the {{ UPC }} portion.

    <div(?:(?!</div>).)*<div    (?:(?!</div>).)*    {{ upc }}    .*?</div>\s*</div>
    

    Here is a demo