linuxbioinformaticsbed

How to subset bed files based on the fragment length?


I am working with bed files and I want to subset rows that are in a specific size range. I'm only interested in rows that "chromEnd - chromStart" is between the range of 140-160.

for example for the following bed file I want to subset the second and the fifth rows (10229-10082 = 147 and 65133-64976 = 157):

chr1    10061   10229   A00327:118:HNV2VDMXX:1:1316:4779:23265  12      +
chr1    10082   10229   A00327:118:HNV2VDMXX:1:2488:28519:18662 30      +
chr1    49486   49880   A00327:118:HNV2VDMXX:1:2412:2564:16517  12      +
chr1    54472   54800   A00327:118:HNV2VDMXX:1:1304:1633:32095  30      +
chr1    64976   65133   A00327:118:HNV2VDMXX:1:1488:3739:12038  30      +
chr1    75240   75547   A00327:118:HNV2VDMXX:1:2370:12102:12524 30      +
chr1    106775  107146  A00327:118:HNV2VDMXX:1:1324:32696:22169 31      +

Is there any possible way to subset these rows?


Solution

  • Many ways, but I really like awk:

    awk '{ s=$3-$2 } s >= 140 && s <= 160 { print }' input.bed > output.bed