pythonbiopythonfasta

Error 'FastaIterator' object has no attribute 'records' in Biopython 1.85


Today, when I executed the following code, I suddenly got an error and could not execute the code Error 'FastaIterator' object has no attribute 'records' in Biopython. I have never had any errors before, so I'm so confused.

from Bio import __version__

print('\n\nBiopython Version : ', __version__, '\n\n')

from Bio import SeqIO


seq = SeqIO.parse(concensus_path, "fasta")

for record in seq.records:
    SeqIO.write(record, folder + '/' + record.name.split('(')[0].replace('_0_', '_') + '.fasta', "fasta")

The first part of the long script is to split a fasta file containing multiple dna sequences into fasta file containing a single dna sequence.

Is there any way to deal with these problems? Input fasta file has no problems at all. I tried with a file that was working fine before, but it also gave an error...


Solution

  • According to the docs here, you can access the records by just iterating over the returned iterator:

    from Bio import __version__
    
    print('\n\nBiopython Version : ', __version__, '\n\n')
    
    from Bio import SeqIO
    
    for record in SeqIO.parse("example.fasta", "fasta"):
        print(record.id)
    

    From version 1.84 to 1.85:

    SeqIO.parse(...) --> <class 'Bio.SeqIO.FastaIO.FastaIterator'> Object lost the records attribute that I think was just unpacking the iterator in memory***.

    Try installing Biopython 1.84 with pip install -v biopython==1.84

    and the for an input like:

    fasta_test.fasta:

    >DNA_sequence_1
    GCAAAAGAACCGCCGCCACTGGTCGTGAAAGTGGTCGATCCAGTGACATCCCAGGTGTTGTTAAATTGAT
    CATGGGCAGTGGCGGTGTAGGCTTGAGTACTGGCTACAACAACACTCGCACTACCCGGAGTGATAGTAAT
    GCCGGTGGCGGTACCATGTACGGTGGTGAAGT
    
    >DNA_sequence_2
    TCCCAGCCAGCAGGTAGGGTCAAAACATGCAAGCCGGTGGCGATTCCGCCGACAGCATTCTCTGTAATTA
    ATTGCTACCAGCGCGATTGGCGCCGCGACCAGGATCCTTTTTAACCATTTCAGAAAACCATTTGAGTCCA
    TTTGAACCTCCATCTTTGTTC
    
    
    >DNA_sequence_3
    AACAAAAGAATTAGAGATATTTAACTCCACATTATTAAACTTGTCAATAACTATTTTTAACTTACCAGAA
    AATTTCAGAATCGTTGCGAAAAATCTTGGGTATATTCAACACTGCCTGTATAACGAAACACAATAGTACT
    TTAGGCTAACTAAGAAAAAACTTT
    
    

    try to run:

    from Bio import __version__
    
    print('\n\nBiopython Version : ', __version__, '\n\n')
    
    from Bio import SeqIO
    
    import sys
    
    concensus_path ='fasta_test.fasta'
    
    seq = SeqIO.parse(concensus_path, "fasta")
    
    print('\n\ntype(seq) : ', type(seq), '\n')
    
    print('\n\nseq.records size : ', sys.getsizeof(seq.records),'\n\n')
    
    print('\n\nseq. size : ', sys.getsizeof(seq),'\n\n')
    

    and tell us if you see any difference

    ADDENDUM:

    ***I was wrong seq.records returns a generator !!!!

    try add more records to the fasta_test.fasta file and

    compare the previous object size with:

    seq = SeqIO.parse(concensus_path, "fasta")
    recs = [i for i in seq.records]
    # print(recs)
        
    print('all records  size  : ' , sys.getsizeof(recs))
    

    I think that seq.records is created in 1.84 in

    biopython/Bio/SeqIO/Interfaces.py/class SequenceIterator :

    ....
    ....
    try:
                self.records = self.parse(self.stream)
    ....
    ....
    

    at __init__ of class SequenceIterator because of how FastaIterator is defined class FastaIterator(SequenceIterator) and SeqIO parse method returned objects.

    In 1.85 class FastaIterator(SequenceIterator) lose its parse method too.

    in 1.84 is at line 189 :

     def parse(self, handle):
            """Start parsing the file, and return a SeqRecord generator."""
            records = self.iterate(handle) ## iterate is the next method defined in the class
            return records