I am trying to extract any word before Y
which is boundary separated. As I am trying to consider each line as a separate record using (?m)
flag and trying to capture \w+
which is look ahead by \s+Y
,but I am only able to print 1st match, not the 2nd match(IMP1
).
print(foo)
this is IMP Y text
and this is also IMP1 Y text
this is not so IMP2 N text
Y is not important
Current fruitless attempt:
>>> m = re.search('(?m).*?(\w+)(?=\s+Y)',foo)
>>> m.groups()
('IMP',)
>>>
>>> m = re.search('(?m)(?<=\s)(\w+)(?=\s+Y)',foo)
>>> m.groups()
('IMP',)
>>>
Expected result Is:
('IMP','IMP1')
You can use
\w+(?=[^\S\r\n]+Y\b)
See the regex demo. Details:
\w+
- one or more letters/digits/underscores
-(?=[^\S\r\n]+Y\b)
- immediately followed with one or more whitespaces other than CR and LF and then Y
as a whole word (\b
is a word boundary).See a Python demo:
import re
foo = "this is IMP Y text\nand this is also IMP1 Y text\nthis is not so IMP2 N text\nY is not important"
print(re.findall(r'\w+(?=[^\S\r\n]+Y\b)', foo))
# => ['IMP', 'IMP1']