Given a file containing this string:
IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*OK@IT1*1*CS*VN*ABC@SAC*X*500@REF*ZZ*BAR@IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*BAR@IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*OK@
The goal is to extract the following:
IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*BAR@
With the criteria being:
*EA*BARSome notes for consideration:
The goal is to select the "group" of lines meeting the criteria.
I tried the following:
grep -oP "IT1[^@]*EA[^@]*@.*REF[^@]*BAR[^@]*@" file.txt
But it captures characters from the beginning of the example.
Also tried to use lookarounds:
grep -oP "(?<=IT1[^@]*EA[^@]*@).*?(?=REF[^@]*BAR[^@]*@)" file.txt
But my version of grep returns:
grep: lookbehind assertion is not fixed length
Your issue is that .* will match characters from the first IT1 with EA to the last REF with BAR. You need to ensure the match doesn't go past the next IT1, which you can do by replacing .* with a tempered greedy token (?:(?!@IT1).)*:
IT1[^@]*EA[^@]*@(?:(?!@IT1).)*REF[^@]*BAR[^@]*@
This will only match from an IT1 to its corresponding REF.
Regex demo on regex101