Given a file containing this string:
IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*OK@IT1*1*CS*VN*ABC@SAC*X*500@REF*ZZ*BAR@IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*BAR@IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*OK@
The goal is to extract the following:
IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*BAR@
With the criteria being:
*EA*
BAR
Some notes for consideration:
The goal is to select the "group" of lines meeting the criteria.
I tried the following:
grep -oP "IT1[^@]*EA[^@]*@.*REF[^@]*BAR[^@]*@" file.txt
But it captures characters from the beginning of the example.
Also tried to use lookarounds:
grep -oP "(?<=IT1[^@]*EA[^@]*@).*?(?=REF[^@]*BAR[^@]*@)" file.txt
But my version of grep returns:
grep: lookbehind assertion is not fixed length
Your issue is that .*
will match characters from the first IT1
with EA
to the last REF
with BAR
. You need to ensure the match doesn't go past the next IT1
, which you can do by replacing .*
with a tempered greedy token (?:(?!@IT1).)*
:
IT1[^@]*EA[^@]*@(?:(?!@IT1).)*REF[^@]*BAR[^@]*@
This will only match from an IT1
to its corresponding REF
.
Regex demo on regex101