I have a file that looks like this:
a: 0
a: 0
a: 0
a: 1
b: 1
c: 1
d: 1
e: 1
f: 1
a: 2
b: 2
c: 2
d: 2
e: 2
f: 2
a: 3
b: 3
c: 3
d: 3
e: 3
f: 3
c: 4
c: 4
c: 4
I want to capture and output all of the a
and c
lines of the form <a line><anything other than an a or c line><c line>
so the output would look like:
a: 1
c: 1
a: 2
c: 2
a: 3
c: 3
Note that neither the a: 0
lines at the beginning nor the c: 4
lines at the end are captured because they don't follow the pattern I mentioned. Note also that the b
lines between the a
and c
lines are removed.
I've been trying to do this with lookarounds usings Bash's pcregrep, but haven't found a solution yet. Any ideas?
Thanks!
Try:
$ awk -F: '$1=="a"{aline=$0} $1=="c"{if(aline)print aline ORS $0 ORS; aline=""}' file
a: 1
c: 1
a: 2
c: 2
a: 3
c: 3
By default, awk reads in one line at a time.
-F:
This tells awk to use :
as the field separator.
$1=="a"{aline=$0}
Everytime an a
line is observed, save the line in variable aline
.
$1=="c"{if(aline)print aline ORS $0 ORS; aline=""}
Every time a c
line is observed, check to see if we have a nonempty aline
. If so, print aline
and the current line, separated by newline characters. Also, set aline
back to an empty string.
For those who prefer their commands spread over several lines:
awk -F: '
$1=="a"{
aline=$0
}
$1=="c"{
if(aline)
print aline ORS $0 ORS
aline=""
}' file
$ sed -n '/^a/h; /^c/{x;/^a/{p;x;s/$/\n/;p};h}' file
a: 1
c: 1
a: 2
c: 2
a: 3
c: 3
-n
This tells sed not to print anything unless we explicitly ask it to.
/^a/h
Any time we have a line that starts with a
, we save it to the hold space.
/^c/{ x; /^a/{ p; x; s/$/\n/; p}; h}
Any time we have a line that starts with c
, we:
We swap (x
) the pattern space with the hold space.
If the new pattern space starts with a
, then we print (p
) it, and swap (x
) again, add a new line to the end of the new pattern space (s/$/\n/
) and print (p
) it.
Lastly we save the current pattern space (which starts with a c
) to the hold space.