I am writing a script which can replace all of the instances of an amino acid residue in a column of a FASTA alignment file. Using AlignIO, I just can read an alignment file and extract information from it but I can't modify their sequences. Moreover, MutableSeq module just can be modify string sequences and if I use a seq object input, it can't modify it. I'd like to find a module or a method to modify an alignment file and save it, while it is in the structure of AlignIO as a sequence object for subsequent procedures.
My code using just AlignIO:
alignment = AlignIO.read(input_handle, "fasta")
for record in alignment:
if record.seq[10] == "G":
record.seq[10] = "T"
Output:
record.seq[10] = "T"
TypeError: 'Seq' object does not support item assignment
My code using both AlignIO and MutableSeq:
alignment = AlignIO.read(input_handle, "fasta")
for record in alignment[0:1, : ]:
mut_align = MutableSeq(record.seq)
mut_align.__delitem__(10)
mut_align.insert(10, "T")
print(mut_align)
Output:
del self.data[index]
TypeError: 'Seq' object doesn't support item deletion
This is a good question, I think what you're doing with MutableSeq
should work or fail clearly right away, but instead here is a workaround:
from Bio import AlignIO
alignment = AlignIO.read('test.fasta', 'fasta')
for record in alignment:
record.seq = record.seq.tomutable()
if record.seq[2] == "G":
record.seq[2] = "T"
print(record)
Outputs:
ID: 1
Name: 1
Description: 1
Number of features: 0
MutableSeq('ATTAG')
ID: 2
Name: 2
Description: 2
Number of features: 0
MutableSeq('AATAG')
For input data:
$ cat test.fasta
>1
ATGAG
>2
AAGAG
I consider the fact that a MutableSeq
object is created from a Seq
object in your example but that the method fails as a bug, which I filed here: https://github.com/biopython/biopython/issues/1918
Here is another rather inefficient workaround, re-building the string each time, if you want to avoid using MutableSeq
all-together:
alignment = AlignIO.read('test.fasta', 'fasta')
for record in alignment:
if record.seq[2] == "G":
record.seq = record.seq[:2] + 'T' + record.seq[3:]
print(record)