
edit only the 1st column of a fasta header by removing strings after '-'

I have a fasta file with the following header structure:


Where each section is separated by a pipe '|', and the first section is a combination of species_name-accessionID.

I want to remove the accesionIDs after the hyphen '-', but keep everything else. Like this:


I've tried:

sed -E '/^>/s/(\|[^-]*)-.*$/\1/' input.fasta > output.fasta

But this removes everything after the hyphen '-':


I've used this piece of code before to edit my header and include the taxid= before my 2nd column:

awk 'BEGIN { FS=OFS="|" } /^>/ { print $1, "taxid=", $2, $3; next } { print }' file.fa > edit_file

I was wondering if there is a way to maybe combine these 2 commands, where i edit my first column and then reprint the rest, but i don't know how to do it :(

I appreciate any help with this!


  • I suggest with sed:

    sed 's/-[^|]*//' file

    Output to stdout:


