regexmatchregex-lookarounds

Find regex match separated by two semicolons


This is the type of text I'm working with:

* went to the building; opened the door; closed the door; picked up some money ($20)
* walked next door; knocked on a window; purchased an apple pie ($6.95)
* skipped down the street; turned to see where I'd come from; grabbed some burritos ($8); the street is wet from the rain
* installed some plumbing ($23)

Using regex on this example, I'm looking to grab the following:

from line 1: "picked up some money ($20)"

from line 2: "purchased an apple pie ($6.95)"

from line 3: "grabbed some burritos ($8)"

from line 4: "installed some plumbing ($23)"

The unifying factor in all of these is that they follow either a "*" or a ";", have a first word ending in "ed" and have a dollar value at the end in parentheses.

This is the regex I've got so far:

(?=[^;]*$)\w+ed (.+) \((\$\d{1,3}(.\d{2})?)\)

This matches everything correctly from lines 1, 2 and 4, but does not match the section I want on line 3 due to the fact that there is a trailing ";" and further text on the same line.

Any advice on what I can adjust in the regex is greatly appreciated!


Solution

  • You want to avoid going past the next semi colon. Use [^;\r\n]+ between the ed and dollar sign.

    (?<= [*;] )
    [^\S\r\n]* 
    (                             # (1 start)
       \w+ ed \b [^;\r\n]+ 
       \( \$ \d+ 
       (?: \. \d* )?
       \) 
    )                             # (1 end)
    

    https://regex101.com/r/d6e0ve/1

    (?<=[*;])[^\S\r\n]*(\w+ed\b[^;\r\n]+\(\$\d+(?:\.\d*)?\))