I have fasta file with x reads and I want to rename the ids of each read. My file is as follow:
>e855552f-9484-4674-8fc4-d9b1f1add023 runid=4140cbe7f7cd36a17a00b732f27dd37bd09e4380 sampleid=mariposas read=55861 ch=2 start_time=2022-11-23T17:43:40Z model_version_id=2021-05-17_dna_r9.4.1_minion_768_2f1c8637 barcode=barcode06
GCTTTAACATTTAGCTATTTATGACACAGTGAAATAAAAGTAATATCTTTTTATTTTTAATTGTATTTATTAGTTACATGTTTTCACATGCATTTAACATAAATGTGATAATTTATGGGAATTACACTACTGTCAAAGTAGTT
>c9d90319-ec63-4347-9244-ad080b0815c5 runid=4140cbe7f7cd36a17a00b732f27dd37bd09e4380 sampleid=mariposas read=30196 ch=317 start_time=2022-11-23T14:47:32Z model_version_id=2021-05-17_dna_r9.4.1_minion_768_2f1c8637 barcode=barcode11
GCCTTGACTATATGGTTTACCTGTTCAAATACGACTCTACTCATGGTCGTTTCAAGGGAACAGTTGAGGTTCAAGGATGGTTTCCTCGTAGTAGTCTCAATGGAAACAAATCTCCTGTCTTCTGTGAAAGAGACCCTAAAATC
And I want to rename the reads with the first id and the barcode (last word) linked by "_".
My expect out is:
>e855552f-9484-4674-8fc4-d9b1f1add023_barcode06
GCTTTAACATTTAGCTATTTATGACACAGTGAAATAAAAGTAATATCTTTTTATTTTTAATTGTATTTATTAGTTACATGTTTTCACATGCATTTAACATAAATGTGATAATTTATGGGAATTACACTACTGTCAAAGTAGTT
>c9d90319-ec63-4347-9244-ad080b0815c5_barcode11
GCCTTGACTATATGGTTTACCTGTTCAAATACGACTCTACTCATGGTCGTTTCAAGGGAACAGTTGAGGTTCAAGGATGGTTTCCTCGTAGTAGTCTCAATGGAAACAAATCTCCTGTCTTCTGTGAAAGAGACCCTAAAATC
I'm trying with sed
command, but I don't know if it's the best option. I don't know how to specify a number of words (first and last) with sed instead of a fixed word.
This sed
one-liner should do the trick:
sed 's/ .* barcode=/_/' file
which simply replaces the substring from the first space character to the barcode=
with a _
.