unixawkcsplit

Unix awk command to execute a specific logic


I am not so good with Unix commands and struggling to achieve this.

I have a file like below

INPUT

ABCDEF_XY_12345_PQRTS_67367
1,a,b,c1
2,a,b,c2
3,a,b,c3
.....
APRTEYW_XY_23456_GDJHJH_232434
1,a,b,c4
2,a,b,c5
3,a,b,c6
......
GDHGJHG_XY_35237_FHDJFH_738278
1,a,b,c7
2,a,b,c8
3,a,b,c9
......

OUTPUT

12345,1,a,b,c1
12345,2,a,b,c2
12345,3,a,b,c3
23456,1,a,b,c4
23456,2,a,b,c5
23456,3,a,b,c6
35237,1,a,b,c7
35237,2,a,b,c8
35237,3,a,b,c9

Essentially, taking substring between _XY_[<STRING>]_ and prepending them to following lines like <STRING>,1,a,b,c1 until we encounter a string matching pattern _XY_[<STRING>]_ and then repeat the same process till EOF.

I am trying to find an easy way to do it either using awk or splitting the master file to multiple smaller files. Can you pls in the correct direction?


Solution

  • Try awk with multiple delimiter

    awk -F"[_,]" -v OFS=, ' { if(/_/) { k=$3 } else { print k,$0 } } ' file
    

    Thanks @EdMorton, single delimiter is enough

    awk -F_ -v OFS=, ' { if(/_/) { k=$3 } else { print k,$0 } } ' file
    

    it can be further shortened as

    awk -F_ -v OFS=, ' /_/ {k=$3;next} { print k,$0 } ' file
    

    with your given inputs

    $ cat filex.txt
    ABCDEF_XY_12345_PQRTS_67367
    1,a,b,c1
    2,a,b,c2
    3,a,b,c3
    APRTEYW_XY_23456_GDJHJH_232434
    1,a,b,c4
    2,a,b,c5
    3,a,b,c6
    GDHGJHG_XY_35237_FHDJFH_738278
    1,a,b,c7
    2,a,b,c8
    3,a,b,c9
    
    $ awk -F_ -v OFS=, ' { if(/_/) { k=$3 } else { print k,$0 } } ' filex.txt
    12345,1,a,b,c1
    12345,2,a,b,c2
    12345,3,a,b,c3
    23456,1,a,b,c4
    23456,2,a,b,c5
    23456,3,a,b,c6
    35237,1,a,b,c7
    35237,2,a,b,c8
    35237,3,a,b,c9
    
    $