bashfor-loopcat

How do I loop through files and combine them based on a species designation? - Bash


I'm trying to merge fastq.gz files together based on species, and I'm trying to figure out how to do that without explicitly naming the species I'm using so that I can use the same bash script for different groups of species later. I am relatively unfamiliar with bash, so this may be a more basic issue.

The file names look like this:

GSF3164-Moyle-107-6_L_S75_R1_001.fastq.gz
GSF3164-Moyle-107-6_L_S75_R2_001.fastq.gz
GSF3164-Moyle-107-7_F_S48_R1_001.fastq.gz
GSF3164-Moyle-107-7_F_S48_R2_001.fastq.gz
GSF3164-Moyle-107-7_L_S76_R1_001.fastq.gz
GSF3164-Moyle-107-7_L_S76_R2_001.fastq.gz
GSF3164-Moyle-1322-10_F_S44_R1_001.fastq.gz
GSF3164-Moyle-1322-10_F_S44_R2_001.fastq.gz
GSF3164-Moyle-1322-10_L_S96_R1_001.fastq.gz
GSF3164-Moyle-1322-10_L_S96_R2_001.fastq.gz
GSF3164-Moyle-1322-1_F_S42_R1_001.fastq.gz
GSF3164-Moyle-1322-1_F_S42_R2_001.fastq.gz

The species designations in these files are 107 and 1322. What loop would work for automatically combining files with these names?

I was generally thinking that it should look something like this:

for SPECIES in GSF3164-Moyle-SPECIES*
do
    cat GSF3164-Moyle-SPECIES* > otherFolder/SPECIES.fastq.gz
done

I don't know what I should be putting in the for loop and how to designate each species.

Thank you for your time.


Solution

  • Making some minor changes to your current code:

    for fname in GSF3164-Moyle-*
    do
        IFS='-' read -r _ _ specie _ <<< "${fname}"             # split fname on "-" delimiter; we're only interested in the 3rd 'field' (ie, the numeric specie)
        cat "${fname}" >> otherFolder/"${specie}".fastq.gz      # append to single file for given specie
    done