bashbioinformaticssamtools

Extracting unmapped reads where both mates are unmapped using samtools?


I'm trying to determine the best way to extract unmapped reads in which both mates in a pair did not map. Currently, it seems that my code is simply extracting all unmapped reads, regardless of their mate. I'm not sure how to go about this, as I'm already using the -f option to extract unmapped reads. Would I just do another iteration of samtools view?

samtools view -@ 4 -buh -f4 sample${r}_pe.remove.sam > sample${r}_pe.unmapped.bam

Solution

  • To extract only the reads where read 1 is unmapped AND read 2 is unmapped (= both mates are unmapped):

    samtools view -b -f12 input.sam > output.both_mates_unmapped.bam
    

    Here, the options are:

    -b - output BAM,
    -f12 - filter only reads with flag: 4 (read unmapped) + 8 (mate unmapped).

    SEE ALSO:

    Decoding SAM flags: https://broadinstitute.github.io/picard/explain-flags.html