shellsedbioinformaticsgff

Sed function in shell applied to all .gff files in a directory


I am working with .gff3 files trying to remove contig sequences in the bottom of many files in a directory. The contig sequences are separated from the rest of the file with a ##FASTA, and I wish to delete everything below (DNA sequences, FASTA format).

This script works for one file:

sed '/^##FASTA$/,$d' file1.gff > file1_altered.gff

But I fail when I try to apply it to all files in a directory like this:

for F in directory/input/*; do
   N=$(basename $F) sed '/^##FASTA$/,$d' ${F} > directory/output/$N.gff
done

Any help appreciated!


Solution

  • You are missing a semicolon after N=$(basename $F). The way it is written is it only a one-shot assignment, i.e. N is empty when used in the redirection.

    You can avoid using basename entirely if you use the shell's builtin string processing: ${F##*/} removes the longest left part matching */.

     for F in directory/input/*; do
       sed '/^##FASTA$/,$d' "${F}" > "directory/output/${F##*/}.gff"
     done