I having trouble parsing out a GFF file. I am using the code below as a one liner. I am obtaining an output filtered based on column 1 ($1) but when I add the additional filter of greater than 5000 but less than 150000, awk does not filter out my file appropriately. I am misunderstanding something and I am not quite sure what it is.
awk '{ $1 = "s10";
$4 >= 50000 && $4 <=150000;
print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6""\t"$7"\t"$8"\t"$9}' infile > outfile
input
S03 GeneWise mRNA 7000 84000 40.00 - . ID=NA;Source=NA;Function="NA";
S07 GeneWise CDS 80450 96070 . - 0 Parent=NA;
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
S10 GeneWise CDS 43800 76000 . - 0 Parent=NA;
S10 GeneWise mRNA 175032 190540 41.11 + . ID=NA;Source=NA;Function="NA";
S11 GeneWise CDS 3700 15000 . + 0 Parent=NA;
S15 GeneWise mRNA 18055 25000 40.00 - . ID=S15;Source=NA;Function="NA";
output i am obtaining with the error
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
S10 GeneWise CDS 43800 76000 . - 0 Parent=NA;
S10 GeneWise mRNA 175032 190540 41.11 + . ID=NA;Source=NA;Function="NA";
expected output
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
This is the correct form for the conditional. However, there is only one matching record for it:
$ awk '
$1 == "S10" && $4 >= 50000 && $4 <=150000 {
print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9
}' file
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
unless you want records with $1 == "S10" || $4 $4 >= 50000 && $4 <=150000
ie. using logical OR) but that would bring one extra record:
awk '
$1 == "S10" || $4 >= 50000 && $4 <=150000 {
print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9
}' file
S07 GeneWise CDS 80450 96070 . - 0 Parent=NA;
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
S10 GeneWise CDS 43800 76000 . - 0 Parent=NA;
S10 GeneWise mRNA 175032 190540 41.11 + . ID=NA;Source=NA;Function="NA";
Better form of the first:
$ awk '
BEGIN { OFS="\t" } # define OFS to \t
$1 == "S10" && $4 >= 50000 && $4 <=150000 {
$1=$1 # rebuild the record
print # output
}' file