bioinformaticssamtools

In bioinformatics, what is a singleton?


I've quickly realized that bioinformatics is not a subject which has its terms clearly defined and easily accessible. I have an apparent discrepancy with some of my results.

I used samtools view -b -h -f 8 fileName.bam > mateUnmapped.bam on several BAM files. I am under the impression that this command extracts only reads whose partner does not align to the draft genome (also includes header; the output is in BAM format)

When I use samtools 'flagstat' on the resulting files, I get an interesting result: the number of 'singletons' do not match the total number of reads... which seems odd to me.

The only reconciliation I can find is here:

http://seqanswers.com/forums/showthread.php?t=46711

One person which replies to the question posed in this forum claims that singletons are sometimes defined as sequences which do not have a partner read at all. However, that still doesn't explain away my result. Flagstat says about 40% of my reads are singletons, but I feel like based on the 'view' command I used, they should ALL be singletons.

Can a seasoned bioinformatician help me out?


Solution

  • In general genomic assembly, a singlton is a read that did not assemble into a contig or map to a reference. It is a contig of only 1 read.

    In samtools, a singleton refers to a read that mapped but the mate didn't.

    Flagstat says about 40% of my reads are singletons, but I feel like based on the 'view' command I used, they should ALL be singletons.

    I'm not a samtools expert, but I think -f 8 means show reads whose mates did not map. That doesn't say anything about the read itself, just its mate. So you are probably getting reads where both mates that didn't map at all (60%) AND reads where only one of the mates mapped (40%). ?

    You might want to try running with -f 8 -F 4 to be reads that mapped but whose mates did not.