awkgwas

Subsetting GWAS results by matching snp column from another file


I have a GWAS summary estimate file with the following columns (file 1):

1   chr1_1726_G_A      0.023  0.160
1   chr1_20184_GAATA_G 0.033  0.180
1   chr1_791101_T_TGG  0.099  0.170

file 2

chr1_20184_GAATA_G
chr1_791101_T_TGG

I would like to match the column1 of file 2 with column 2 of file1 to create a file 3 such as:

1   chr1_20184_GAATA_G 0.033  0.180
1   chr1_791101_T_TGG  0.099  0.170

By using the below code, I get an empty file3:

awk 'FNR==NR{arr[$2];next} (($2) in arr)' file2 file1 > file3

Solution

  • With your shown samples, please try following awk code.

    awk 'FNR==NR{arr[$0];next} ($2 in arr)' file2 file1
    

    OR

    awk 'FNR==NR{arr[$1];next} ($2 in arr)' file2 file1
    

    Explanation: Use $0(in 1st solution) OR $1(in OR solution) for array rather than using $2 in first block and then rest of your code is fine to match; matching records here.