I have a circular mtDNA reference. After the alignment I would like to cut first and last 15 bases in my fasta files. How can I do that?
For example, this is my sequence and I need to take out first and last 15 letters. The first 15 characters will not always be the same for all the samples.
>ref|NC_005038.2| mitochondrion_circulized, complete genome
TTTACATGGTAAGTGGTTGATGTAGCTTAAACTTAAAGCAAGGCACTGAAAATGCCTAGATGAGTGTACC
AACTCCATAAACACATAGGTTTGGTCCCAGCCTTCCTGTTAACTCTCAACAGACTTACACATGCAAGCAT
CCACGCCCCGGTGAGTAACGCCCTCCAAATCAATAAGACTAAGAGGAGCAGGTATCAAGCACACATCTTG
TAGCTTATAACGCCTCGCTTAACCACACCCCTACGGGAGACAGCAGTGACAAAAATTAAGCCATAAACGA
The output I expect would be:
>ref|NC_005038.2| mitochondrion_circulized, complete genome
GTTGATGTAGCTTAAACTTAAAGCAAGGCACTGAAAATGCCTAGATGAGTGTACCAACTCCATAAACACA
TAGGTTTGGTCCCAGCCTTCCTGTTAACTCTCAACAGACTTACACATGCAAGCATCCACGCCCCGGTGAG
TAACGCCCTCCAAATCAATAAGACTAAGAGGAGCAGGTATCAAGCACACATCTTGTAGCTTATAACGCCT
CGCTTAACCACACCCCTACGGGAGACAGCAGTGACAAAAA
Here is one way to do it using Python's Biopython module:
from Bio import SeqIO
import textwrap #use it to wrap it to certain column size
input_file = open('input.fasta','r')
output_file = open('output.fasta', "w")
for seq_record in SeqIO.parse(input_file, "fasta"):
sequence = str(seq_record.seq)
trimmed_sequence = sequence[15:-15]
wrapped_sequene = textwrap.fill(trimmed_sequence, width = 60)
fasta_record = f'>{seq_record.id}\n{wrapped_sequene}'
output_file.write(fasta_record)
print(fasta_record) # will print on the screen
input_file.close()
output_file.close()