regexregex-lookaroundspositive-lookahead

Understanding regex lookaround to get desired result


I am trying to isolate street address fields that begin with a digit, contain an underscore and end with a comma:

001 ALLAN Witham Ross 13 Every_Street, Welltown Greenkeeper 002 ALLARDYCE Margaret Isabel 49 Bell_Road, Musicville Housewife 003 ALLARDYCE Mervyn George 49 Bell_Road, Musicville Company Mngr

e.g

13 Every_Street, Welltown
49 Bell_Road, Musicville
49 Bell_Road, Musicville

My regex is

(?ms)([0-9]+\s[A-Z][a-z].+(?=,))

But this matches 13 through to the last 'd' of Bell_Road. Which is almost everything. See regex101 example

This matches two commas but not the third? I want it to match up to the next comma. But do it three times :)


Solution

  • This produces your desired matches:
    \d+[^,\d]*_[^,]+, \S+
    demo

    They don't end with a comma, tho.
    For that you could just remove \S+ at the end.