shellperlawksedpattern-matching

How to select lines between two same marker patterns which may occur multiple times with awk/perl or any other command line tool


Using awk or perl or any command line tool, how can I select lines which are occurring between two same marker patterns? There may be multiple sections marked with these patterns. So, the block starts at first occurrence of the pattern and ends at the second occurrence of the pattern. Everything ignored after that till the next occurrence of the pattern which is considered first occurrence and repeat.

For example: Suppose the file contains:

abc
def1
ghi1
jkl1
abc
1
2
3
abc
def2
ghi2
jkl2
abc
4
5
6
abc
stu
abc

And the pattern is abc. So, I need the output as:

abc
def1
ghi1
jkl1
abc
abc
def2
ghi2
jkl2
abc
abc
stu
abc

I tried various solutions from the other related questions, but they were all for different start and end patterns.

How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)?

Extract lines between two patterns from a file

Extract text between 2 markers Extract lines between 2 tokens in a text file using bash

I updated the solutions as per my need, which looked something like this:

perl -lne 'if(/abc/){$flag=1; print} elsif(/abc/){$flag=0}' file.txt
awk '/abc/,/abc/' file.txt

I only ended up getting lines which contain the pattern and not the text block between them.

How can I do this in awk or perl or any command line tool, such that I get the text block with same patterns ?


Solution

  • An easy way is with the range operator, the three-dot variant

    perl -wne'/abc/ ... /abc/ and print' data.txt
    

    Another way, with an explicit flag and all but more concise

    perl -wnlE' /abc/ and $f ^= 1; $f and say' data.txt
    

    This doesn't print the end marker though. To have both start and end markers printed

    perl -wnlE' ($f or /abc/) and say; /abc/ and $f ^= 1' data.txt
    

    Explanation --

    In Perl all logical operators short-circuit. Consider A and B: if the first expression (A) evaluates to something "falsey" then B is not evaluated -- that code doesn't run. Thus A and B is mostly equivalent to if (A) { B }.

    I use the short-circuiting nature here to streamline code for a one-liner; it's normally far clearer in normal code to write it out nicely. So, the first statement amounts to


    Of course since this reads from a file and line endings aren't touched one can use print instead of say and then only -wne switches are needed.

    I just liked say better here. Also, -lE with say handles a case where this filter is fed strings wihtout linefeeds, as well.