shellawkbioinformaticsvcf-variant-call-format

AWK Loop Over Multiple Columns


Please pretend I have the following situation (multiple columns & rows):

1/1:123:121 TAB 0/0:1:21 TAB 1/1:12:14
0/1:12:23 TAB 0/1:12:15 TAB 0/0:123:16
0/0:3:178 TAB 1/1:123:121 TAB 1/1:2:28

What would like to have is awk looping over each column and writing a new output under these conditions:

IF the firs field (which are separated by ":") is 1/1 OR 0/0,

then write "NA" TAB "NA"

ELSE

write the two numbers the the following fields, "Number 1" TAB "Number 2". Separator between columns should be TAB.

Thus, the desired outout the the example used above would be:

NA TAB NA TAB NA TAB NA TAB NA TAB NA
12 TAB 23 TAB 12 TAB 15 TAB NA TAB NA
NA TAB NA TAB NA TAB NA TAB NA TAB NA

Below is my current code, which work for the first column, but I do not know how to make it work for ALL columns in the file.

awk '{split($0,a,":"); print a[1]"\t"a[2]"\t"a[3]}' |
awk -F"\t" '{
    if ($1 == "0/0" || $1 == "1/1")
        print $1="NA", $2="NA"
    else
        print $2"\t"$3
}'

Any ideas of how this could be achieved?

Many thanks in advance, George.


Solution

  • You may use this awk:

    awk -v OFS='\t' -F '[:\t]' '{
       s = ""
       for (i=1; i<=NF; i+=3)
          s = (s == "" ? "" : s OFS) ($i == "0/0" || $i == "1/1" ? "NA" OFS "NA" : $(i+1) OFS $(i+2))
       print s
    }' file
    
    NA  NA  NA  NA  NA  NA
    12  23  12  15  NA  NA
    NA  NA  NA  NA  NA  NA