unixseddo-while

split in lines and take a pattern in a file


I have many files .txt which looks like:

file1.txt

header
1_fff_aaa 1_rrr_aaa 1_ggg_aaa ...

file2.txt

header
1_ttt_aaa 1_iii_aaa 1_lll_aaa ...

I would like to remove the header and split the string of the second line in multiple lines after the white space and take the pattern in between the _ character:

Output:

file1_v1.txt

fff
rrr
ggg

file2_v1.txt

ttt
iii
lll

I would like to utilise unix commands like sed


Solution

  • Something like that:

    Program: split.awk

    NR == 1 {
        # ignore first header line
        next
    }
    {
        i=1
        while (i <= NF) {
            gsub(/^[^_]*_/, "", $i)
            gsub(/_[^_]*$/, "", $i)
            print $i
            i++
        }
    }
    

    Executed like that:

    awk -f split.awk file1.txt > file1_v1.txt
    

    To execute it on many files:

    for f in file*.txt; do echo "$f"; awk -f split.awk "$f" > "${f%.txt}_v1.txt" ; done
    

    UPDATE

    You could also use sed & tr:

    sed -n '2,$p' file1.txt | tr " " "\n" | sed 's/^[^_]*_\(.*\)_[^_]*$/\1/'