awkgrepperl-hash

Compare 4 columns in two files; and output the line for unique combination (from first file) and line for duplicate combination (from second file)


I have two tab separated values file, say

File1.txt

chr1    894573  rs13303010  GG
chr2    18674   rs10195681  **CC**
chr3    104972  rs990284    AA  <--- Unique Line
chr4    111487  rs17802159  AA
chr5    200868  rs4956994   **GG**
chr5    303686  rs6896163   AA  <--- Unique Line
chrX    331033  rs4606239   TT
chrY    2893277 i4000106    **GG**
chrY    2897433 rs9786543   GG
chrM    57  i3002191    **TT**


File2.txt

chr1    894573  rs13303010  GG
chr2    18674   rs10195681  AT
chr4    111487  rs17802159  AA
chr5    200868  rs4956994   CC
chrX    331033  rs4606239   TT
chrY    2893277 i4000106    GA
chrY    2897433 rs9786543   GG
chrM    57  i3002191    TA

Desired Output:

Output.txt

chr1    894573  rs13303010  GG
chr2    18674   rs10195681  AT
chr3    104972  rs990284    AA  <--Unique Line from File1.txt
chr4    111487  rs17802159  AA
chr5    200868  rs4956994   CC
chr5    303686  rs6896163   AA  <--Unique Line from File1.txt
chrX    331033  rs4606239   TT
chrY    2893277 i4000106    GA
chrY    2897433 rs9786543   GG
chrM    57  i3002191    TA

File1.txt has total 10 entries while File2.txt has 8 entries. I want to compare the both the file using Column 1 and Column 2.

If both the file's first two column values are same, it should print the corresponding line to Output.txt from File2.txt.

When File1.txt has unique combination (Column1:column2, which is not present in File2.txt) it should print the corresponding line from File1.txt to the Output.txt.

I tried various awk and perl combination available at website, but couldn't get correct answer. Any suggestion will be helpful.

Thanks, Amit


Solution

  • next time, show your awk code tryso we can help on error or missing object

    awk 'NR==FNR || (NR>=FNR&&($1","$2 in k)){k[$1,$2]=$0}END{for(K in k)print k[K]}' file1 file2