bashfor-loopalignmentsamtools

applying restriction of files for a loop (for loop) using bowtie2


Hello I just want to apply a loop to a set of files, but instead of doing it to all my files I want to make the loop just only to certain files in a directory

Here is the command that I use, is a bowtie2 based alignment of genomic sequences:

 for i in *1.fastq.gz
    do 
    base=$(basename $i "_1.fastq.gz")
    bowtie2 -p 8 -x /mnt/path/contigs -1 ${base}_1.fastq.gz -2 ${base}_2.fastq.gz | samtools view -b -o ${base}.bam -
    done

so with this command, bowtie2 makes alignment with all my files, but given the fact that on this folder there are files whose bowtie2 analysis is completed I don't want bowtie2 to make analysis over these files again, so, is there any subcommand that I can add to this loop for avoiding analysis of certain files?


Solution

  • Create 2 files, each with 1 basename per line: (1) your inputs, here read 1 fastq base file names, and (2) your existing outputs, here bam base file names. Sort the files and use comm -23 file1 file2 > file3 to select only the basenames that have not been mapped yet. Then loop over those, saved in file3.

    Quick and dirty solution (assuming the filenames do not have whitespace):

    ls -1 *_1.fastq.gz | perl -pe 's/_1.fastq.gz//' | sort > in.basenames.txt
    ls -1 *.bam | perl -pe 's/.bam//' | sort > out.basenames.txt
    comm -23 in.basenames.txt out.basenames.txt > todo.in.basenames.txt
    
    while read -r base_name ; do
        bowtie2 -1 ${base_name}_1.fastq.gz -2 ${base_name}_2.fastq.gz ...
    done < todo.in.basenames.txt