linuxawkvcftoolsbcftools

match 1,2,5 columns of file1 with 1,2,3 columns of file2 respectively and output should have matched rows from file 2. second file is zipped file .gz


file1

3   1234581 A   C   rs123456

file2 zipped file .gz

1   1256781 rs987656    T   C
3   1234581 rs123456    A   C
22  1792471 rs928376    G   T

output

3   1234581 rs123456    A   C

I tried

zcat file2.gz | awk 'NR==FNR{a[$1,$2,$5]++;next} a[$1,$2,$3]' file1.txt  - > output.txt

but it is not working


Solution

  • Please try following awk code for your shown samples. Use zcat to read your .gz file and then pass it as 2nd input to awk program for reading, after its done reading with file1.

    zcat your_file.gz | awk 'FNR==NR{arr[$1,$2,$5];next} (($1,$2,$3) in arr)' file1 -
    

    Fixes in OP's attempt: