pythonregex-lookaroundspython-re

python re identifiers not working with lookahead and lookbehind


I have the following string

str = '2024-09-23 18:05:08,147 INFO  [WatchDog_191084]  (alloc:0MB, cpu:0%) 10      422'

and I am trying to extract the numbers between the squared brackets. so I am trying with

identifier_test = re.search('(?<=\[)\d+(?=])',str)
print(identifier_test)

I get None, but if I try

identifier_test = re.search('(?<=\[).+(?=])',str)
print(identifier_test.group())

it works as expected and returns WatchDog_191084. How do I get the numbers only?


Solution

  • In your first pattern, nothing matches the WatchDog_ part of the input string. The lookbehind expects to find a [ character immediately before the numbers, but that's not what it finds, so the match fails. If your inputs will always have WatchDog_ in them, you can make that part of the lookbehind:

    re.search(r'(?<=\[WatchDog_)\d+(?=])',str)
    

    If you want to accept any text there, things get a little trickier. Python's re regex engine only supports fixed length lookbehinds, so something like (?<=\[[^\]\d]*) isn't allowed. In that situation, using a pattern like your second one and extracting the numeric bits with some post processing would make the most sense.