bashunixawksedgwas

How to replace 00 with Na excluding first row & first column using bash in comma separated file


I'm working with GWAS data, My data looks like this:

IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,00,AG,GT,AK,00
32,AG,GG,AA,00,AT
100,TT,AA,00,AG,AA       
3,GG,AG,00,GT,GG

Desired Output:

IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA       
3,GG,AG,N/A,GT,GG

Here I'm trying to replace "00" with "N/A", but since I have 00 in the first row & First Column (IID), the command I used:

sed '1!s~00~N/A~g' allSNIPsFinaldata.csv 

The above command excludes the first row but not the first column as a result I got IID Values 100, 200, and 300 as 1N/A, 2N/A, and 3N/A. Can anyone please help "how to exclude the first row & First Column as well and perform the above operation. please help


Solution

  • Using any awk in any shell on every Unix box:

    $ awk '{$0=$0","; gsub(/,00,/,",N/A,"); sub(/,$/,"")} 1' file
    IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
    1,N/A,AG,GT,AK,N/A
    32,AG,GG,AA,N/A,AT
    100,TT,AA,N/A,AG,AA
    3,GG,AG,N/A,GT,GG
    

    The above assumes none of the column names on the first row will be the exact string 00. If they can be then just tweak the above to:

    awk 'NR>1{$0=$0","; gsub(/,00,/,",N/A,"); sub(/,$/,"")} 1' file