I have many files .txt which looks like:
file1.txt
header
1_fff_aaa 1_rrr_aaa 1_ggg_aaa ...
file2.txt
header
1_ttt_aaa 1_iii_aaa 1_lll_aaa ...
I would like to remove the header and split the string of the second line in multiple lines after the white space and take the pattern in between the _ character:
Output:
file1_v1.txt
fff
rrr
ggg
file2_v1.txt
ttt
iii
lll
I would like to utilise unix commands like sed
Something like that:
Program: split.awk
NR == 1 {
# ignore first header line
next
}
{
i=1
while (i <= NF) {
gsub(/^[^_]*_/, "", $i)
gsub(/_[^_]*$/, "", $i)
print $i
i++
}
}
Executed like that:
awk -f split.awk file1.txt > file1_v1.txt
To execute it on many files:
for f in file*.txt; do echo "$f"; awk -f split.awk "$f" > "${f%.txt}_v1.txt" ; done
You could also use sed
& tr
:
sed -n '2,$p' file1.txt | tr " " "\n" | sed 's/^[^_]*_\(.*\)_[^_]*$/\1/'