I'm trying to merge fastq.gz files together based on species, and I'm trying to figure out how to do that without explicitly naming the species I'm using so that I can use the same bash script for different groups of species later. I am relatively unfamiliar with bash, so this may be a more basic issue.
The file names look like this:
GSF3164-Moyle-107-6_L_S75_R1_001.fastq.gz
GSF3164-Moyle-107-6_L_S75_R2_001.fastq.gz
GSF3164-Moyle-107-7_F_S48_R1_001.fastq.gz
GSF3164-Moyle-107-7_F_S48_R2_001.fastq.gz
GSF3164-Moyle-107-7_L_S76_R1_001.fastq.gz
GSF3164-Moyle-107-7_L_S76_R2_001.fastq.gz
GSF3164-Moyle-1322-10_F_S44_R1_001.fastq.gz
GSF3164-Moyle-1322-10_F_S44_R2_001.fastq.gz
GSF3164-Moyle-1322-10_L_S96_R1_001.fastq.gz
GSF3164-Moyle-1322-10_L_S96_R2_001.fastq.gz
GSF3164-Moyle-1322-1_F_S42_R1_001.fastq.gz
GSF3164-Moyle-1322-1_F_S42_R2_001.fastq.gz
The species designations in these files are 107 and 1322. What loop would work for automatically combining files with these names?
I was generally thinking that it should look something like this:
for SPECIES in GSF3164-Moyle-SPECIES*
do
cat GSF3164-Moyle-SPECIES* > otherFolder/SPECIES.fastq.gz
done
I don't know what I should be putting in the for loop and how to designate each species.
Thank you for your time.
Making some minor changes to your current code:
for fname in GSF3164-Moyle-*
do
IFS='-' read -r _ _ specie _ <<< "${fname}" # split fname on "-" delimiter; we're only interested in the 3rd 'field' (ie, the numeric specie)
cat "${fname}" >> otherFolder/"${specie}".fastq.gz # append to single file for given specie
done