bashloopsfastqsbatch

how to produce multiple readlength.tsv at once from multiple fastq files?


ı have 16 fastq files under the different directories to produce readlength.tsv seperately and ı have some script to produce readlength.tsv .this is the script that ı should use to produce readlength.tsv

zcat ~/proje/project/name/fıle_fastq | paste - - - - | cut -f1,2 | while read readID sequ;
do
    len=`echo $sequ | wc -m`
    echo -e "$readID\t$len"
done > ~/project/name/fıle1_readlength.tsv

one by one ı can produce this readlength but it will take long time .I want to produce readlength at once thats why I created list that involved these fastq fıles but ı couldnt produce any loop to produce readlength.tsv at once from 16 fastq files.

ı would appreaciate ıf you can help me


Solution

  • Assuming a file list.txt contains the 16 file paths such as:

    ~/proje/project/name/file1_fastq
    ~/proje/project/name/file2_fastq
    ..
    ~/path/to/the/fastq_file16
    

    Then would you please try:

    #!/bin/bash
    
    while IFS= read -r f; do                # "f" is assigned to each fastq filename in "list.txt"
        mapfile -t ary < <(zcat "$f")       # assign "ary" to the array of lines
        echo -e "${ary[0]}\t${#ary[1]}"     # ${ary[0]} is the id and ${#ary[1]} is the length of sequence
    done < list.txt > readlength.tsv
    

    As the fastq file format contains the id in the 1st line and the sequence in the 2nd line, bash built-in mapfile will be better to handle them.

    As a side note, the letter ı in your code looks like a non-ascii character.