regexregex-lookaroundslookbehindalternation

Regex to collect data after one search term and before one of two others (which ever is first)


I need to fashion a regex with the following requirements:

Given sample text:

SEARCH_TERM_#1 find this text SEARCH-TERM_#2_more text_SEARCH-TERM_#3
SEARCH_TERM_#1 find this text SEARCH-TERM_#3

I want to extract the string which appears in the find this text area

The regex should collect data after SEARCH_TERM_#1 upto but not including SEARCH_TERM_#2 or SEARCH-TERM_#3 which ever comes first. It should choose as the 'right-side' search border whatever it finds first of #2 and #3.

I've tried (?>SEARCH_TERM_#2|SEARCH_TERM_#3) (?=(?>SEARCH_TERM_#2|SEARCH_TERM_#3)) and (?>(?=SEARCH_TERM_#2)|(?=SEARCH_TERM_#3)) . And they ALL include the second search term into the collected data and stop before the third, while I want the collected data stop before the #2 or #3 which ever comes first.


Solution

  • Description

    This regular expression will:

    ^.*?SEARCH_TERM_\#1((?:(?!SEARCH-TERM_\#2|SEARCH-TERM_\#3).)*)

    enter image description here

    Expanded

    You didn't specify a language so I'm including this PHP example only to show how it works.

    Input Text

    skip this text SEARCH_TERM_#1 find this text SEARCH-TERM_#2 more text to ignore SEARCH_TERM_#3
    

    Code

    <?php
    $sourcestring="your source string";
    preg_match('/^.*?SEARCH_TERM_\#1((?:(?!SEARCH-TERM_\#2|SEARCH-TERM_\#3).)*)/ims',$sourcestring,$matches);
    echo "<pre>".print_r($matches,true);
    ?>
    

    Matches

    $matches Array:
    (
        [0] => skip this text SEARCH_TERM_#1 find this text 
        [1] =>  find this text 
    )
    

    Real World Example

    Or to use your real world example included in the comments:

    Regex: ^.*?style="background-image: url\(((?:(?!&cfs=1|\)).)*)

    Input text: <a href=http://i.like.kittens.com style="background-image: url(http://I.like.kittens.com?Name=Boots&cfs=1)">

    Matches:

    [0] => <a href=http://i.like.kittens.com style="background-image: url(http://I.like.kittens.com?Name=Boots
    [1] => http://I.like.kittens.com?Name=Boots
    

    Disclaimer

    This vaguely looks like common problem in parsing HTML using regex. If your input text is HTML then you should investigate using an HTML parsing tool rather then a regular expression.