I wrote a Regex using pcregrep, and everything behaved as expected until I added a positive lookahead.
Scenario:
I have the following text file:
a
b
c
a
c
Goal:
I want to use a Regex with pcregrep to return a line containing a
and a line containing c
with a line containing b
between them that is not captured. So it would capture the first three lines (a
, b
, c
) and return the first (a
) and third (c
) line. It would not capture the fourth and fifth line because there is no b
line between them. So the output would be:
a
c
What I've tried
If I run pcregrep -M 'a\nb\nc\n'
(command 1), this captures and returns:
a
b
c
as expected. So I now want to modify this to capture the b
line with a positive lookahead. I tried this: pcregrep -M 'a\n(?=(b\n))c\n'
(command 2). However, this returns nothing.
My question:
Why does command 2 not return the expected output, where command 1 does? How can I return the desired result? I know there are ways to do this other than pcregrep
, but please note that I want to use pcregrep
because I'll be extending the functionality to solve similar problems.
You can use 2 capture groups with -o
option:
pcregrep -M -o1 -o2 '(a\n)b\n(c)\n' file
a
c
Details:
(...)
: In regex it is used for capturing groups-o1 -o2
: prints only capture group #1 and #2Note that your regex a\n(?=(b\n))c\n
won't work because lookahead is just assertion with zero-width match. Your regex asserts presence of b\n
after a\n
which is fine but it attempts to match c\n
right after a\n
and this is where matching fails.