I have saveral files with pattern such as
NODE_1_length_59711_cov_84.026979_g0_i0_1 | 12.8 |
NODE_1_length_59711_cov_84.026979_g0_i0_2 | 18.9 |
NODE_2_length_59711_cov_84.026979_g0_i0_1 | 14.3 |
NODE_2_length_59711_cov_84.026979_g0_i0_2 | 16.1 |
NODE_165433_length_59711_cov_84.026979_g0_i0_1 | 29 |
I want to remove all characters from starting '1' to last '_'. so that I can get an output like this from multiple files-
1_1 | 12.8 |
1_2 | 18.9 |
2_1 | 14.3 |
2_2 | 16.1 |
165433_1 | 29 |
Using GNU awk:
awk -F "\t" '{ fld1=gensub(/(^NODE_)([[:digit:]]+)(.*)([[:digit:]]+$)/,"\\2_\\4","g",$1);OFS=IFS;print fld1"\t"$2}' file
Explanation:
awk -F "\t" '{ # Set the field separator to tab
fld1=gensub(/(^NODE_)([[:digit:]]+)(.*)([[:digit:]]+$)/,"\\2_\\4","g",$1); # Split the first field into 4 sections represented in parenthesis and then substitute the line for the the second section, a "_" and then the fourth section. Read the result into a variable fld1
print fld1"\t"$2 # Print fld1, followed by a tab and then the second field.
}' file