I have a tab delimited file_1
NC_025 4569 . KX838946.2
NC_025 16546 . KJ641660.1
NC_025 11996 . KX932454.2
file_2
NC_025.1 RefSeq gene 5690 7513 . + . ID=gene-NZ82_gp4;Dbxref=GeneID:20964334;Name=NZ82_gp4;gbkey=Gene;gene_biotype=protein_coding;locus_tag=NZ82_gp4
NC_025.1 RefSeq gene 4612 10046 . + . ID=gene-NZ82_gp5;Dbxref=GeneID:20964335;Name=NZ82_gp5;gbkey=Gene;gene_biotype=protein_coding;locus_tag=NZ82_gp5
NC_025.1 RefSeq gene 10337 16933 . + . ID=gene-NZ82_gp6;Dbxref=GeneID:20964336;Name=NZ82_gp6;gbkey=Gene;gene_biotype=protein_coding;locus_tag=NZ82_gp6
NC_025.1 RefSeq gene 9000 12000 . + . ID=gene-AL82_gp5;Dbxref=GeneID:109647334;Name=AL82_gp5;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AL82_gp5
I want to compare column 2 of file 1 with column 4 and 5 of file 2. If column 2 of file_1 is >= column 4 and <= column5 of same row of file 2, I want to combine the whole line of file_1 and file_2
NC_025 16546 . KJ641660.1 NC_025.1 RefSeq gene 10337 16933 . + . ID=gene-NZ82_gp6;Dbxref=GeneID:20964336;Name=NZ82_gp6;gbkey=Gene;gene_biotype=protein_coding;locus_tag=1NZ82_gp6
NC_025 11996 . KX932454.2 NC_025.1 RefSeq gene 10337 16933 . + . ID=gene-NZ82_gp6;Dbxref=GeneID:20964336;Name=NZ82_gp6;gbkey=Gene;gene_biotype=protein_coding;locus_tag=1NZ82_gp6
NC_025 11996 . KX932454.2 NC_025.1 RefSeq gene 9000 12000 . + . ID=gene-AL82_gp5;Dbxref=GeneID:109647334;Name=AL82_gp5;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AL82_gp5
I have tried :
awk '{
if (NR==FNR) {
l[NR]=$0
a[NR]=$2
}
else if (a[FNR]>=$4 && a[FNR]<=$5) {
print l[FNR],$0
}
}' file_1 file_2 > File_3
But it prints nothing.
So, you basically want to join all lines using a range criteria. After storing the first file, you need to iterate over its lines for each line in the second file.
awk '
NR==FNR {a[NR]=$0; p[NR]=$2; next}
{for (n in a) if ($4<=p[n] && p[n]<=$5) print a[n] "\t" $0}
' file_1.txt file_2.txt > file_3.txt