I want to extract all the unique read IDs in a fastq file and output the unique read IDs to a text file. (I have done the same task for bam files using the samtools but I don't know any tools that would handle fastq files.)
for BAM files: samtools view input.bam|cut -f1 | sort | uniq >> unique.reads.txt
for fastq: (need help)
Looking for a one-liner command or a tool that can do that.
Thank you.
using seqkit (no need to sort): here you basically:
seqkit fx2tab reads.fq | awk -v OFS='\t' '{array[$1]=1} END {for (readID in array) print readID}' > unique.reads.txt
also you can do this:
seqkit fx2tab reads.fq | cut -f 1 | sort | uniq > unique.reads.txt
but then you'll need to sort the file first
or almost the same but without seqkit:
grep "@" reads.fq | sort | uniq > unique.reads.txt
grep "@" reads.fq | awk -v OFS='\t' '{array[$1]=1} END {for (readID in array) print readID}' > unique.reads.txt
but I in general like seqkit, always advertise it