Hello I just want to apply a loop to a set of files, but instead of doing it to all my files I want to make the loop just only to certain files in a directory
Here is the command that I use, is a bowtie2 based alignment of genomic sequences:
for i in *1.fastq.gz
do
base=$(basename $i "_1.fastq.gz")
bowtie2 -p 8 -x /mnt/path/contigs -1 ${base}_1.fastq.gz -2 ${base}_2.fastq.gz | samtools view -b -o ${base}.bam -
done
so with this command, bowtie2 makes alignment with all my files, but given the fact that on this folder there are files whose bowtie2 analysis is completed I don't want bowtie2 to make analysis over these files again, so, is there any subcommand that I can add to this loop for avoiding analysis of certain files?
Create 2 files, each with 1 basename per line: (1) your inputs, here read 1 fastq base file names, and (2) your existing outputs, here bam base file names. Sort the files and use comm -23 file1 file2 > file3
to select only the basenames that have not been mapped yet. Then loop over those, saved in file3
.
Quick and dirty solution (assuming the filenames do not have whitespace):
ls -1 *_1.fastq.gz | perl -pe 's/_1.fastq.gz//' | sort > in.basenames.txt
ls -1 *.bam | perl -pe 's/.bam//' | sort > out.basenames.txt
comm -23 in.basenames.txt out.basenames.txt > todo.in.basenames.txt
while read -r base_name ; do
bowtie2 -1 ${base_name}_1.fastq.gz -2 ${base_name}_2.fastq.gz ...
done < todo.in.basenames.txt