bashloopsbioinformatics

bulk operation with reformat.sh (bbmap)


I have a folder with paired-end sequencing data in the following format:

Bk1-ITS2_S102_L001_R1_001.fastq.gz
Bk1-ITS2_S102_L001_R2_001.fastq.gz
Bk2-ITS2_S103_L001_R1_001.fastq.gz
Bk2-ITS2_S103_L001_R2_001.fastq.gz
Fl1-ITS2_S201_L001_R1_001.fastq.gz
Fl1-ITS2_S201_L001_R2_001.fastq.gz
Fl2-ITS2_S202_L001_R1_001.fastq.gz
Fl2-ITS2_S202_L001_R2_001.fastq.gz
Mn1-ITS2_S401_L001_R1_001.fastq.gz
Mn1-ITS2_S401_L001_R2_001.fastq.gz
Mn2-ITS2_S402_L001_R1_001.fastq.gz
Mn2-ITS2_S402_L001_R2_001.fastq.gz

I want to run the following command for the entire folder either using a loop or some wildcard because doing this for each pair of reads is cumbersome and takes a lot time:

reformat.sh in=Bk1-ITS2_S102_L001_R1_001.fastq.gz in2=Bk1-ITS2_S102_L001_R2_001.fastq.gz   out=./reformat/Bk1-ITS2_S102_L001_R1_001_reformatted.fastq.gz out2=./reformat/Bk1-ITS2_S102_L001_R2_001_reformatted.fastq.gz mincalledquality=2 maxcalledquality=41 qin=33

So that I achieve the following for all files with modified quality scores and an additional string "reformatted" in name of each file.

Bk1-ITS2_S102_L001_R1_001_reformatted.fastq.gz
Bk1-ITS2_S102_L001_R2_001_reformatted.fastq.gz
Bk2-ITS2_S103_L001_R1_001_reformatted.fastq.gz
Bk2-ITS2_S103_L001_R2_001_reformatted.fastq.gz
Fl1-ITS2_S201_L001_R1_001_reformatted.fastq.gz
Fl1-ITS2_S201_L001_R2_001_reformatted.fastq.gz
Fl2-ITS2_S202_L001_R1_001_reformatted.fastq.gz
Fl2-ITS2_S202_L001_R2_001_reformatted.fastq.gz
Mn1-ITS2_S401_L001_R1_001_reformatted.fastq.gz
Mn1-ITS2_S401_L001_R2_001_reformatted.fastq.gz`
Mn2-ITS2_S402_L001_R1_001_reformatted.fastq.gz
Mn2-ITS2_S402_L001_R2_001_reformatted.fastq.gz

I can only do it for one pair of reads at a time. Can someone help me how to achieve this?


Solution

  • Here's one way of doing it with bash (I assume the '`' is a typo):

    $ for f in *R1_001.fastq.gz
    do
        reformat.sh in=$f in2=${f/R1/R2} out=./reformat/${f/.fastq.gz/_reformatted.fastq.gz} out2=./reformat/${f/R1_001.fastq.gz/R2_001_reformatted.fastq.gz} mincalledquality=2 maxcalledquality=41 qin=33
    done