I am not so good with Unix commands and struggling to achieve this.
I have a file like below
INPUT
ABCDEF_XY_12345_PQRTS_67367
1,a,b,c1
2,a,b,c2
3,a,b,c3
.....
APRTEYW_XY_23456_GDJHJH_232434
1,a,b,c4
2,a,b,c5
3,a,b,c6
......
GDHGJHG_XY_35237_FHDJFH_738278
1,a,b,c7
2,a,b,c8
3,a,b,c9
......
OUTPUT
12345,1,a,b,c1
12345,2,a,b,c2
12345,3,a,b,c3
23456,1,a,b,c4
23456,2,a,b,c5
23456,3,a,b,c6
35237,1,a,b,c7
35237,2,a,b,c8
35237,3,a,b,c9
Essentially, taking substring between _XY_[<STRING>]_
and prepending them to following lines like <STRING>,1,a,b,c1
until we encounter a string matching pattern _XY_[<STRING>]_
and then repeat the same process till EOF.
I am trying to find an easy way to do it either using awk
or splitting the master file to multiple smaller files. Can you pls in the correct direction?
Try awk
with multiple delimiter
awk -F"[_,]" -v OFS=, ' { if(/_/) { k=$3 } else { print k,$0 } } ' file
Thanks @EdMorton, single delimiter is enough
awk -F_ -v OFS=, ' { if(/_/) { k=$3 } else { print k,$0 } } ' file
it can be further shortened as
awk -F_ -v OFS=, ' /_/ {k=$3;next} { print k,$0 } ' file
with your given inputs
$ cat filex.txt
ABCDEF_XY_12345_PQRTS_67367
1,a,b,c1
2,a,b,c2
3,a,b,c3
APRTEYW_XY_23456_GDJHJH_232434
1,a,b,c4
2,a,b,c5
3,a,b,c6
GDHGJHG_XY_35237_FHDJFH_738278
1,a,b,c7
2,a,b,c8
3,a,b,c9
$ awk -F_ -v OFS=, ' { if(/_/) { k=$3 } else { print k,$0 } } ' filex.txt
12345,1,a,b,c1
12345,2,a,b,c2
12345,3,a,b,c3
23456,1,a,b,c4
23456,2,a,b,c5
23456,3,a,b,c6
35237,1,a,b,c7
35237,2,a,b,c8
35237,3,a,b,c9
$