pythonout-of-memorybioinformaticsbiopythonsequence-alignment

MemoryError from BioPython's Align.PairwiseAligner()


I'm trying to write a Python3 script that performs a global alignment of two sequences, of ~ 10 kb and 11 kb length. Both are very similar to each other. (I'm trying to find the few points where they do not match, one of which I know is close to the end of the query sequence, which is why a local alignment is not sufficient (BLASTN just crops the last 2 bases to get a better alignment score).)

I had hoped to be able to use BioPython for this, specifically the Align.PairwiseAligner. However, I'm getting a MemoryError.

Is there a way to configure or use BioPython to use less memory (more runtime is completely fine - within reason, of course)? Or any other way to perform a global alignment of 2 sequences that I can use without having to resort to external software like Emboss Needle?

(This is part of a larger software. I already have to ship BLAST and BioPython along. I would prefer not to have to ship yet another software, if it can be avoided.)

Here's an MVCE:

from Bio import Align

ref_seq = "GAAAGACCTGAAAGATCACGGTGCCTTCATTTCAACTGTGAGACATGAAGTAATTTTCCCAAATCTACAACATTAAGATATGGTGCAATAAGGACCAGATTAAAGGTCTCCTGATTTGCGGCCATGTTCCCTCCATCTCCTTTACTCCTAAACACACTCACACTCACTACTGCAAATAGTTGTCTTGTCAAGTGGGAAATGAATGCTCTTACAAGGCTCAAACTTGTGAACACATCACTGACCAGCACAGAGCTGGCTACAATAGCTCCCCAATTAAGGTGTTTTACATGCAACTGGTTCAAACCTTCCAAGTGCTAAATTAAAACAATCCTTTAAAGAAGGAAATTCTGTTTCAGAAGAGGACCTTCATACAGCATCTCTGACCAGCAACTGATGATGCTATTGAACTCAGATGCTGATTGGTTCTCCAACACGAGATTACCCAACCCAGGAGCAAGGAAATCAGTAACTTCCTCCCTATAACTTGGAATGTGGGTGGAGGGGTTCATAGTTCTCCCTGAGTGAGACTTGCCTGCTAAGCTGGCCCCTGGTCCTGTCCTGTTCTCCAGCATGGTGTGTCTGAAGCTCCCTGGAGGCTCCTGCATCCTAGCGCTGACAGTGACACTGATGGTGCTGAGCTCCCCACTGGCTTTGGCTGGGGACACCCGACGTAAGTGCACATTGCGGGTGCTGAGCTACTATGGGGTGGGGAAAATAGGGAGTTTTGTTAACATTGTGCCCAGGCCATGTCCCTTAAGAAATTGTGACGTTTTCTTCAGAGATTGCCCATCTTTATCATTGGATCCCAAATTATTTCCTCCATAAAAGGAGCTTGGGTACTTGCCCTCTTCATGAGACTTGTGTAAGGGGCCTTTGCACAAGTCATTTCTTTTCAAATCTCCACCAATAAAACCTTTGCATCACATGTCCTCAGGGTCTTTAGAGGATTTAGAAATAAGGATTCTAAAATAAATTCCCCATACAGCACTTCCCTTTATTATGTTGACTTATGTCAGACAAAAGGAGGTTCTTACTGAAAATTTTGTGGGAGTCAAGGGAATTCAAAGGGTCTCTCCTAGACGATCCAGTGTTAGGTTCCCCACAGGACCTTTGGTGTTGGCCATAGTCCTCATATGTGAGGATGGACCCAGTGGCCTCCCCATTATCTCCTTTCTTTTCTTGCTGAACTCCAATGTTTATAAGGCCTGTATCCCTGTAGCGTATGTAGGTTCTCTGACAGAAGTTATACTTAGTGCTCTTCCTTTCTTGTGGGGAAAAGTCCCTGGAACTGAAGCTGAGATTGTTAGTACTTGGAGTCACCTTACAGATACAGAGCATTTATGAGGTATTCTTTGGTGCCTAAAGAACTTAAGGCATCCTCTGAAAAACTAGCCCAGGTTCGTGTTCATTATGAATCTTTTTTAACCTTTCTGTACTTGTTTCTCTTGCATCTCCTATGTGCTCTAACTAGACATGACAGAAGAGATTTAACTAATGTATAAATTATATGAAATTCTATTTTTTAAGTCAAAAATAATCAACTATCAGAAATTTAATAATGTTCAAACTATATACTCTGTGTGGGGTTACCGAGATGATGTGAACATTGTTCACGTCTCATAGGGCTGAAAGTCAATGGGCAAGTCTTGGGAACTCATTGTCTTACTGGGGTCTTGTCCTAAATTTCCTAGGTTCACCCATCATGCCCTCAGCTTTCCTTAACTAGCCATGTCTGCTTACCTCTTCCTCCAGTTTCTATTTTTCCCCAGCTATGTTGTCATCATTTCCAGAAATCTCTAAAGCTTGCACAGAACCTTAGCACTATGCGATTCATTGAAAGAGACTTTTTTTCTCTTTTTGAGGTAGGGCCTGGCTCTGTCACCCAGTCTATAGCTCAGTGGTGTGATCGAGGCTCACTGCAACCTCTGCCACCCATGCTCAAGTGATCCTCCCTCCTCAGGCTCCAGAGTAGCTGGGAATACAGGCAGGCAACCACGCCCAGCTAATTTTTGTAATTTTGGTAGACATGAGATTTTGCCATGTTGCCCAGGCTGGTCTTAAACTGCTGGTTTCAAGCAATCCTCCTGCCTTGGCCTCCCAACATGCTAGGATTATAGATGTGAGCCACTGTGCCCAGGCAAAAAGAAATGACTCTTAATAAAAAAATTTCCTTTTTCTTAAATCACTGTTTCTTTATCTGTGAATTCTTCTTCCAACTAGAAGGAGGAGAAAAAAGAAGTTTGCCTGTATTTCTCACCAGGAGGAGAAGGGGTCTAGTGTGACATTAAAATGAAAGGGTGCTGGAGCTTGAGCCCCTTCTTGCTTTCCAGGATCCCTACAGTGATCAGTTCCCATACCCTGGTTTATTCATGTAAACCACACTTATTTTTCTCAGCAGCTACTCTTTACTGGGCTCCATTCTAGGTTCAAATCATTCTATTTGATTAAGTTAGAGAGCGTCCCTACTCTCATGGAAGTTACACAAGAGTAGAGGAGACAGACACTAACCCAATAAGCATTTAACAAAGAAGATAATGTTAGAGAGTCATAGTGCACTGAAGAAAAGACATCAGGTTTGTGAAAAAGAGAGACATGGATTCACCTACTTTAGTTCATATGTTAGGGAGCTCCACCTGAGAAAGTGACATTCAGCTGAGACAACAAAATAAGTAGACAGTCGTGAAGATCTAAGGGATGAAAGTTCCAGGGAGACCAAATGGCGGGAAAGCCCTGGTGTGGGAAACTATGTGGAGGGAGAGAAAGAAAGCTAGAGGGGCTGAAGTATAGAAAGCAACGAAATGGAGAGGCAGAAGATGAGGTAGGACACAGAGAGGAAGTCAGGAGCCTCATCATTATAGGCTCTGATGTCCACGGTAAAAAAAATTTGAATTTTATTATTTATTTACCTATTTATTTGTTTATTTATTTTATTTTATTTTGTGATGGAGTCTCTCTCTGTTGCCCAGGCTAGATGGCAGTGGCACAATCTCGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAGCAATTCTCCTGCCTCAGCCTCCCAATTAGCTGGGGCCACAGGTGCATGCCACCACACCTGGCTAATTTTTTGTATTTTTAGTAGGGATAGGATTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTTGGCCTCCCAAAGTGCTGGGATTATAGGCGTGAGCCACTGCACCCGGCCTATATTCAATTTTTAAAACTAATTCTAGCTACTCTGTGGGGATTGGATTGTTGGGGTTCACCAGTGGTTAGGAAGACTATTTAGGAGCACAGCAGGGAATTCTCCAGGGAAAACAAGCTTGTGGCTTCATGGAGTGCATTAGTGATAAAGACGGTGAAGAAGATAAAGTGGACAGACTCGGCATGTATTTTTGCTTAGCTTGTTAATGAATTACTGTAAAGGGGGTAGAGCTTATTCCTAAGGATTTTCTTTTGACAAATAAGTGGGTGGTAGTGTTGTTTATTGAGATAGGAAAAACTATGGGAGGAAATGATTTGAAGTGGGTGGTTTGAAATAAAAGTTTTGTTTAAATATGAGATGATTGACTGACATTTATGTGGAGCAATCAGAAGGTCAATGGCATTTAAGAGACTCATGGTGAGGCTAGGGCTTCAGGTATTTATGTTGGAGGCATCAATACGTGTAGTGTGTTAAATTCCAGGGAGTGGAAGAGGATACATAGGGAGATGGATTGTGTGGAGAAACAAGAAGAGGGCACAGGCCAGCAAAGGGGGCTGAGAAAGAGCCCAGGGATGTTGGAGAAAAACCAAGAGAACATCATGTATGTAAGTCAAGGAAAACAGATTTTTTTCAAGGAGAAGGGAGAGGCCAATTGTGGTGAGTACCACTAAGAGGAGGGGGAAGTGAGAATATGACAGAGAAGCAAGTGCTGGGATTGGTGGAGTTGATATTTGCAGTCAATGGAGTATCCAGGGAGGAAACTGGATTGGACCATTTGGAGAGCAAATAAAAGTGAGGACGAGGTTAAGGTTGACTGTTCTGTGTAGAGAGCTTCAGGGAAGGATTGTTCTCTGGGTTCAGGGAGCCAGCTGAACCTAAAGGAAAAGGCTAAAGAAGCTGAAGAGAAGGAGGAGGACCTGTGAACCAGAGATGCTCAGCCATTATTAGCAAGGAAATACAAGAGAGCCCCTGTGTGCAGTGGTGACTACTCATGCAAAATGTCACACAGCCAATATTTAACACAGCCAGGATTTCACACAGCCAATATTTATTAGTGACATAGAATATATTTGTTATTGCTCTAGGTCATGAGAATGGAGTGACAAAATGAATCCGGTCGCCATCAGTATATGCCACATAACATTTTGCAGTGACTGTGTGCCAGGCCTATGAATTTCAGTATTCAATTTCAATAATGATCCTGTTGTATCTGTGGTATTTAAAAACATATACATCTCTGTAATCTAAAATTGAGAGGTTATAAGTAAAACCCAGTATTACAAATTTAGTGCTGGAAATTGATTGCAGTTTAAATCTGAGCATATAGAAAGTCCCTTTCTTCTATGTCAGCAGATGCCTTTTGTGTGAGGTTTAGGTATACTGCATTATTAGACATAAACCAGTGTTTCTGCCCTATGTTTTCAGAATGACAATTCTTTATGAAACTAATAGAAGAACAGAAGACAATTGCAAAATCATGATGAAGATACTAATTGCTTTAGAATTAAGGAATACAAATAATGTGAGCTGCAGTTATAGGGATCAAAAAAGTTACAATGGGAATGTATTTGAGTGTTTATTATGTGATCAGTGCTAAGAAGTGTCATCATTTAATTTTACACTTAACAGTAATCCTGTGAGGATTATGCTATTATTAAATGCACTTGATACATTACAAAAAGGCTTATGGTTGATATAAATTGACCCAAGTAGAAGAGATCATGTTTTTATTCAGGTTTTCTGATTCTAGAGTTTGAGAGTTTGACCATCATTAGTGAGTAGTGACTATATTGTGTCTGAATTATTGACAGAATTTCTGATATTCATATGTACCAGGTTGTTTCTTAGAGTGGGAGCAGAGATGCAAGGGCTGCTAGTTCCGATGTGTAGGAGAAACTATCATTCATTTTGCATTTATCATTTTAAACGTTCTATATGTCTATCCTGGGCATGTGTTGAAGAACACAAGGAAGTATTAAATCACTCCTTCTTCTGAAGTTTGACTGGCAAAATTGGGCTAGAGTTACGAAATAAAATACAGGTTCCTAGTGAAATCTGAATTTCAGATACACAACCATAATTTATTGGAAATCCAAATTTAACTGGATATCTTCTATTATTCTCTGTTCTAGAACTCCACACTTCTAACATTTCTCATTCCTGTCTAAGCTCTTGTGTGTTTGGTTTTTGGCCATCGCTTTCACTGCTCTTTAAGCTCCCCCAGTAGAGTGGAGAGGTCTGTTTTCCCTCGTTTGGATTCCTACAGGCAGCGCAGGCCTGGCACAAGGTCATCACTAAGGAAGTGTTCACAGGGTGAAGGCGGTGGGTGCTGTTGAAGGAACCGGTAAAGCCTGTGGGATGAGAGAAGGAGCAGAGAGTGTTTTTGGGGTGGAGGCTCCCAGGAGGAGGCGGGACGGGCTGCGGTGCTGGGCGGATCCTCCTCCAGCTCCTGCTTGGAGGTCTCCAGAACAGGCTGGAGGCAGGGAGGGGGTCCCAAAAGCCTGGGGATGAGAAGGGGTTTTCCCGCATGGTCCCCCAGGCCCCCGTTCGCCTCAGGAAGACGGAGGATGAGCTCCTGGGCTGCTGGTGGTGGGCGTTGCGGGTGGGGCCGGTTAAGGTTCCCAGTGCCCGCACCCTGCCCAGGGAGCCCCGGATGGTGGCGTCGCTGTCAGTGTCTTCTCAGGAGGCTGCCCGTGTGACCGGATCCTTCGTGTCCCCACAGCACGTTTCTTGTGGCAGCTTAAGTTTGAATGTCATTTCTTCAATGGGACGGAGCGGGTGCGGTTGCTGGAAAGATGCATCTATAACCAAGAGGAGTCCGTGCGCTTCGACAGCGACGTGGGGGAGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCCGAGTACTGGAACAGCCAGAAGGACCTCCTGGAGCAGAGGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGCAGCGGCGAGGTGAGCGCGGCGCGGGGCGGGGCCTGAGTCCCTGTAAGCGGAGAATCTGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTAAGAAAGAGAGAGAGCGCGCCATCTGTGAGCATTTAGAATCCTCTCAATCCCCAGCAAGCAGTTCTGAGAGCACAGGTGTGTGTGTAGAGTGTGGATTTGTCTGGGTATGTCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTGATTCTTATCCTTGGAGACCTCTGTGGGGAGGTGACAAGGGAGGTGGGTGCTGGGGGCTGGAGAGAGAGGAGACCTTGATTGTCTCGGGTCCTTAGAGATGCGGGGAAGGGAAATGTAAGGGGTGTGTGATTGGGGTGAAGGTTTAGGGGAGGACTGCTGAGGGGTAAGGAAGGTTTGGGATAATGTGAGGAGGCCGGTTCCAGACTCTCCCTGGCATACACCCTTCGTGTAATCTCTGAAATAAAAGTGTGTGCTGTTTGTTTGTAAAAGCATTAGATTAATTTCTAGAGGAACTGAGTAGACCTCTGAGGCACTCCTGAAGCTTCTTTATATCTAAATTTCTTGCTAGTTTTTTGGGTTTTTTTAGTGTGTATATTTTTACATAGTAGAAATGACTGTGAAACTAACTTTTTGAATTAAAGTTTTAACACAGTTACTATTTTATTATAATGCTAGTTTTCTAGTAGTTACATATTATTCTTTTATATATAATAGTTATGACACAACTCACCTCACTTTCCCCTTTGTTGACCTTTATTATGACATTCACCAAAAGTTGAAAATGTATGTTTCTGGTTAATTTTTAATTTATATTTTTTATTTGTAATTGCTTTGAATTATTTTGACCTATTTATTGGCGGATTATAATTATTGCTCTAAGAATTCCCTATTGTATTTGGTAGGCAATGGACAATGATCTATGGTCTGATATCTTGAGGGCTTAGTATTTTTCTCAGTGACTTTGTGGGTTCTTTGTGCTATAAGATTATTAACACTTTATTGATATTTGATCCAGCATTTGCTTCAGTTTGTGGTTTGTATGTGGATTTTGAAAGTTCTTTTCCATGTTAAGAATTTGAACTTTTTATTTAATAAAATATATTGCAAAATTTTTATTAATGATTTACAATCCATCTTAAATCTGCCATTTTGTGGCATTGTTGTCTCCAGGTTCCTCCTTACTTCTAAAAAAAATAGTTGTATTTATTGAGAGTATGCTAGTGTTGGGGATTTTCCTGGGCATAAGCACCCCAAGTAACAAGTCCCAGACACTGCCTTAATCCAAATGTGACTCTGGAAAGAAAAATCATTTTACAATGATAGGCCTAATAATAATTATGCTTGTGTTACACGGGAGATGCATTGATCAGCTAAATGTAAATATAAGAACTTTCAAAACTAAAATGACATTCCCTAATCCTTTTCTCTGCTTCGGGACTCATGCTTTTCTAGGAACGTAAAAATTTGGAGAATCATTTCTGTCTGTCCCACATTCCCAGGGGCAGAACAATTTCTGTTTTGTTCTAAGGTGTGAGTGCATGGCAGTAGTATTCCTAAAATTCATACTCGGTTTCCTCATGTACCCAATTCTGTCCCTTTATCTATGCATATTTCTTTAAATCATATTTTTCTGTCATGGGGTACAAGGATGATAAATAGGTGCCAAGTGGAGCACCCAAGTGTGATGAGCGCCCTCACAGTGGAATGGAGTGAGAAGCTTTCTGACCTTATAAACTGAAGGCTATCTTCAGTCATTGTTTTATATATTTTACATGCATTAATCCTCATATAACCCCAAGAGGTAAATTAGTATAATTATCCTTCATTGTAGGTGACAAAGTTGAGACACAGAAGAATCAAAAAACTCTTCCAGGATCAACCAGTAAAAGGCAGACCTTGGATTCGAACCAAGCAACCTGGCTCAAATATCAGTTTTAATTACTACACTCTATACTTTCCAAGATTTGTAAACAGTTTGACAATGCATGCCAATTTAAAGCTATGAAGAAACAAACACAATTTTTCACAACACCTCTCAAATCTAATGGGTCCTCACTGTCAAGATTAAATTCCAGGCTGATGACACTGTAAGGTCACATGGCCAGCTGTGCTATAGGCCTGGTCAAGGTCAGAGCCTGGGTTTGCAGAGAAGCAGACACACAGCCAAACCAGGAGACTTACTCTGTCTTCCTGACTCATTCTCTCTACATTTGTTTTCTCCTAGTTGAGCCTAAGGTGACTGTGTATCCTTCAAAGACCCAGCCCCTGCAGCACCACAACCTCCTGGTCTGCTCTGTGAGTGGTTTCTATCCAGGCAGCATTGAAGTCAGGTGGTTCCGGAACGGCCAGGAAGAGAAGGCTGGGGTGGTGTCCACAGGCCTGATCCAGAATGGAGATTGGACCTTCCAGACCCTGGTGATGCTGGAAACAGTTCCTCGGAGTGGAGAGGTTTACACCTGCCAAGTGGAGCACCCAAGTGTGACGAGCCCTCTCACAGTGGAATGGAGTGAGCAGCTTTCTGACTTCATAAATTTCTCACCCACCAAGAAGGGGACTGTGCTAATCCCTGAGTGTCAGGTTTCTCCTCTCCCACACCCTATTTTCATTTGCTCCATGTTCTCATCTCCATCAGCACAGGTCACTGGGGGTAGCCCTGTAGGTGTTTCTAGAAACACCTGTACCTCCTGGAGAAGCAGTCTCACCTGCCAGGCAGGAGAGACTGTCCCTCTTTTGAACCTCCCCATGATTTCGCAGGTCAGGGTCACCCACTCTCCCCAGGCTCCAGGCCCTGCTTCTGGGTCTGAGACTGAGTTTCTGGTGCTGTTGCTCTGAGTTATTTTTTGTGATCTGGGAAGAGGAGAAGTGTAGGGGCCTACCTGACATGAGGGGAATCCAATCTCAGCCCTGCCTTTTATTAGCTCTGTCACTCTAGACAAACTACTTAGCTTCATTGAGTCTCAGGCTTTCTGTTGATCAGATGTTGAACGCTTGCCTTACATCAAGGCTGTAATATTTGAATGAGTTTGATGTCTGAACATCGTAACTGTTCAGCGTGATTTGAAATCCTTTTTTTCTCCTGAAATGGCTAGTTATTTTAATTCTTGTGGGGCAGGCTTCTGCCCCATTTTCAAAGCTCTGAATCTTAGAGTCTCAATTAAAGAGGTTCAATTTGGAATAAGCATCACTAAACCTGGATTCCTCTCTCAGGAGCACGGTCTGAATCTGCACAGAGCAAGATGCTGAGTGGAGTCGGGGGCTTCGTGCTGGGCCTGCTCTTCCTTGGGGCCGGGCTGTTCATCTACTTCAGGAATCAGAAAGGTGAGGAGCCTTTGGGAGCTGACTCTCTCCATAGGCTTTTCTGGAGGAGGAACCATGGTTTTGCTGAGAGGTTAGTTCTCAGTATACGAGTGGCCCTGAATAAAGCCTTTCTTTCCCCAAACGGCTCTAATGTCCTGCTAATCCAGAAATCATCAGTGCATGGTTACTATGTCAAAGCATAATAGCTTGTGGCCTGCAGAGACAAGAGGAAGGTTAACAAGTAGGGGTCCTTTTGTTTGAGATCCTGGAGCAAATTAAGGAAGAGCCACTAAGGCTAATGGAATTACACTGGATCCTGTGACAGACACTTCAGGCTTCATGGGTCACATGGTCTGTCTCTGCTCCTTTCTGCCCCTTTCTGCCCTGGTTGGTGCGGGTTGTGGTGTTAGAGAAATCTCAGGTGGGAGATCTGCGGCTGGGACATTGTGTTGGAGGACAGATTTGCTTCAATAACTTTTAAGTGTATTTCTTTTCCTCTTTTTCCCAGGACACTCTGGACTTCAGCCAACAGGTAATACCTTTTAATCCTCTTTTAGAAACAGACACAGTTTCCCTAGTGAGAGGTGAAGCCAGCTGGACTTCTGGGTGGGGTGGGGACTTGGAGAACTTTTCTTACAAGAGGTTTCTAAATGCCCCAATCAGTGCTTTGTAAAAACACACCAATAGGTTCTCTGTGGCTAGCTGGATGTTTGTAAAATGGACCAATTTGCACTCTGTAAAATGGACCAATCAGCATTCTTTAAAATGGACCAATCAGCACTTTTAAAATGGGCCAATCAGCACTCTTTAAAATGGACCAATCAGCACTCTTTAAAATGGACCAATCTGCAGGACATGGGCAGGGACAAATACAGGAATAAAAGCTGGCCACCCCAACCAGCAGTGGCAACCCACTCAGGTCCTCTTCCCTGCTGTGGAAGTTTTGTTCTTTTGATCTTCACAATAAATCTTGCTGCTGCCTACTCTTTGGGTCCCTGCCGCCTTTAAGAGCTGTAACACTCACTGTGAAGGTCTGCTTGAAGTCAGCGAGACCACAAACCCACTGGAAGGAACAAACTGCGGACACACTAGAATGATGGTAGAGGTGATAAGGCATGAGACAGAAATAATAGGAAAGACTTTGGATCCAAATTTCTGATCAGGCAATTTACACCAAAATTCCTCCTCTCCACTTAGAAAAGGGCTGTGCTCTGCGGGACTATTGGCTCAGGGGAGACTCAGGAACTTGTTTTTCTGCTTCCTGCAGTGCTCTCATCTGAGCCCTTGAAAGAGGAGAAAAGAAACTGTTAGTAGAGCCAGGTTGAAAACAACACTCTCCTCTGTCTTTTGCAGGATTCCTGAGCTGAAATGCAGATGACCACATTCAAGGAAGAACCTTCTGTCCCAGCTTTGCAGAATGAAAAGCTTTCCTGCTTGGCAGTTATTCTTCCACAAGAGAGGGCTTTCTCAGGACCTGGTTGCTACTGGTTCGGCAACTGCAGAAAATGTCCTCCCTTGTGGCTTCCTCAGCTCCTGCCCTTGGCCTGAAGTCCCAGCATTGATGACAGCGCCTCATCTTCAACTTTT"
query_seq = "GAGTGAGACTTGCCTGCTAAGCTGGCCCCTGCTCCTGTCCTGTTCTCCAGCATGGTGTGTCTGAAGCTCCCTGGAGGCTCCTGCATCCTAGCGCTGACAGTGACACTGATGGTGCTGAGCTCCCCACTGGCTTTGGCTGGGGACACCCGACGTAAGTGCACATTGCGGGTGCTGAGCTACTATGGGGTGGGGAAAATAGGGAGTTTTGTTAACATTGTGCCCAGGCCATGTCCCTTAAGAAATTGTGACGTTTTCTTCAGAGATTGCCCATCTTTATCATTGGATCCCAAATTATTTCCTCCATAAAAGGAGCTTGGGTACTTGCCCTCTTCATGAGACTTGTGTAAGGGGCCTTTGCACAAGTCATTTCTTTTCAAATCTCCACCAATAAAACCTTTGCATCACATGTCCTCAGGGTCTTTAGAGGATTTAGAAATAAGGATTCTAAAATAAATTCCCCATACAGCACTTCCCTTTATTATGTTGACTTATGTCAGACAAAAGGAGGTTCTTACTGAAAATTTTGTGGGAGTCAAGGGAATTCAAAGGGTCTCTCCTAGACGATCCAGTGTTAGGTTCCCCACAGGACCTTTGGTGTTGGCCATAGTCCTCATATGTGAGGATGGACCCAGTGGCCTCCCCATTATCTCCTTTCTTTTCTTGCTGAACTCCAATGTTTATAAGGCCTGTATCCCTGTAGCGTATGTAGGTTCTCTGACAGAAGTTATACTTAGTGCTCTTCCTTTCTTGTGGGGAAAAGTCCCTGGAACTGAAGCTGAGATTGTTAGTACTTGGAGTCACCTTACAGATACAGAGCATTTATGAGGTATTCTTTGGTGCCTAAAGAACTTAAGGCATCCTCTGAAAAACTAGCCCAGGTTCGTGTTCATTATGAATCTTTTTTAACCTTTCTGTACTTGTTTCTCTTGCATCTCCTATGTGCTCTAACTAGACATGACAGAAGAGATTTAACTAATGTATAAATTATATGAAATTCTATTTTTTAAGTCAAAAATAATCAACTATCAGAAATTTAATAATGTTCAAACTATATACTCTGTGTGGGGTTACCGAGATGATGTGAACATTGTTCACGTCTCATAGGGCTGAAAGTCAATGGGCAAGTCTTGGGAACTCATTGTCTTACTGGGGTCTTGTCCTAAATTTCCTAGGTTCACCCATCATGCCCTCAGCTTTCCTTAACTAGCCATGTCTGCTTACCTCTTCCTCCAGTTTCTATTTTTCCCCAGCTATGTTGTCATCATTTCCAGAAATCTCTAAAGCTTGCACAGAACCTTAGCACTATGCGATTCATTGAAAGAGACTTTTTTTCTCTTTTTGAGGTAGGGCCTGGCTCTGTCACCCAGTCTATAGCTCAGTGGTGTGATCGAGGCTCACTGCAACCTCTGCCACCCATGCTCAAGTGATCCTCCCTCCTCAGGCTCCAGAGTAGCTGGGAATACAGGCAGGCAACCACGCCCAGCTAATTTTTGTAATTTTGGTAGACATGAGATTTTGCCATGTTGCCCAGGCTGGTCTTAAACTGCTGGTTTCAAGCAATCCTCCTGCCTTGGCCTCCCAACATGCTAGGATTATAGATGTGAGCCACTGTGCCCAGGCAAAAAGAAATGACTCTTAATAAAAAAATTTCCTTTTTCTTAAATCACTGTTTCTTTATCTGTGAATTCTTCTTCCAACTAGAAGGAGGAGAAAAAAGAAGTTTGCCTGTATTTCTCACCAGGAGGAGAAGGGGTCTAGTGTGACATTAAAATGAAAGGGTGCTGGAGCTTGAGCCCCTTCTTGCTTTCCAGGATCCCTACAGTGATCAGTTCCCATACCCTGGTTTATTCATGTAAACCACACTTATTTTTCTCAGCAGCTACTCTTTACTGGGCTCCATTCTAGGTTCAAATCATTCTATTTGATTAAGTTAGAGAGCGTCCCTACTCTCATGGAAGTTACACAAGAGTAGAGGAGACAGACACTAACCCAATAAGCATTTAACAAAGAAGATAATGTTAGAGAGTCATAGTGCACTGAAGAAAAGACATCAGGTTTGTGAAAAAGAGAGACATGGATTCACCTACTTTAGTTCATATGTTAGGGAGCTCCACCTGAGAAAGTGACATTCAGCTGAGACAACAAAATAAGTAGACAGTCGTGAAGATCTAAGGGATGAAAGTTCCAGGGAGACCAAATGGCGGGAAAGCCCTGGTGTGGGAAACTATGTGGAGGGAGAGAAAGAAAGCTAGAGGGGCTGAAGTATAGAAAGCAACGAAATGGAGAGGCAGAAGATGAGGTAGGACACAGAGAGGAAGTCAGGAGCCTCATCATTATAGGCTCTGATGTCCACGGTAAAAAAAATTTGAATTTTATTATTTATTTACCTATTTATTTGTTTATTTATTTTATTTTATTTTGTGATGGAGTCTCTCTCTGTTGCCCAGGCTAGATGGCAGTGGCACAATCTCGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAGCAATTCTCCTGCCTCAGCCTCCCAATTAGCTGGGGCCACAGGTGCATGCCACCACACCTGGCTAATTTTTTGTATTTTTAGTAGGGATAGGATTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTTGGCCTCCCAAAGTGCTGGGATTATAGGCGTGAGCCACTGCACCCGGCCTATATTCAATTTTTAAAACTAATTCTAGCTACTCTGTGGGGATTGGATTGTTGGGGTTCACCAGTGGTTAGGAAGACTATTTAGGAGCACAGCAGGGAATTCTCCAGGGAAAACAAGCTTGTGGCTTCATGGAGTGCATTAGTGATAAAGACGGTGAAGAAGATAAAGTGGACAGACTCGGCATGTATTTTTGCTTAGCTTGTTAATGAATTACTGTAAAGGGGGTAGAGCTTATTCCTAAGGATTTTCTTTTGACAAATAAGTGGGTGGTAGTGTTGTTTATTGAGATAGGAAAAACTATGGGAGGAAATGATTTGAAGTGGGTGGTTTGAAATAAAAGTTTTGTTTAAATATGAGATGATTGACTGACATTTATGTGGAGCAATCAGAAGGTCAATGGCATTTAAGAGACTCATGGTGAGGCTAGGGCTTCAGGTATTTATGTTGGAGGCATCAATACGTGTAGTGTGTTAAATTCCAGGGAGTGGAAGAGGATACATAGGGAGATGGATTGTGTGGAGAAACAAGAAGAGGGCACAGGCCAGCAAAGGGGGCTGAGAAAGAGCCCAGGGATGTTGGAGAAAAACCAAGAGAACATCATGTATGTAAGTCAAGGAAAACAGATTTTTTTCAAGGAGAAGGGAGAGGCCAATTGTGGTGAGTACCACTAAGAGGAGGGGGAAGTGAGAATATGACAGAGAAGCAAGTGCTGGGATTGGTGGAGTTGATATTTGCAGTCAATGGAGTATCCAGGGAGGAAACTGGATTGGACCATTTGGAGAGCAAATAAAAGTGAGGACGAGGTTAAGGTTGACTGTTCTGTGTAGAGAGCTTCAGGGAAGGATTGTTCTCTGGGTTCAGGGAGCCAGCTGAACCTAAAGGAAAAGGCTAAAGAAGCTGAAGAGAAGGAGGAGGACCTGTGAACCAGAGATGCTCAGCCATTATTAGCAAGGAAATACAAGAGAGCCCCTGTGTGCAGTGGTGACTACTCATGCAAAATGTCACACAGCCAATATTTAACACAGCCAGGATTTCACACAGCCAATATTTATTAGTGACATAGAATATATTTGTTATTGCTCTAGGTCATGAGAATGGAGTGACAAAATGAATCCGGTCGCCATCAGTATATGCCACATAACATTTTGCAGTGACTGTGTGCCAGGCCTATGAATTTCAGTATTCAATTTCAATAATGATCCTGTTGTATCTGTGGTATTTAAAAACATATACATCTCTGTAATCTAAAATTGAGAGGTTATAAGTAAAACCCAGTATTACAAATTTAGTGCTGGAAATTGATTGCAGTTTAAATCTGAGCATATAGAAAGTCCCTTTCTTCTATGTCAGCAGATGCCTTTTGTGTGAGGTTTAGGTATACTGCATTATTAGACATAAACCAGTGTTTCTGCCCTATGTTTTCAGAATGACAATTCTTTATGAAACTAATAGAAGAACAGAAGACAATTGCAAAATCATGATGAAGATACTAATTGCTTTAGAATTAAGGAATACAAATAATGTGAGCTGCAGTTATAGGGATCAAAAAAGTTACAATGGGAATGTATTTGAGTGTTTATTATGTGATCAGTGCTAAGAAGTGTCATCATTTAATTTTACACTTAACAGTAATCCTGTGAGGATTATGCTATTATTAAATGCACTTGATACATTACAAAAAGGCTTATGGTTGATATAAATTGACCCAAGTAGAAGAGATCATGTTTTTATTCAGGTTTTCTGATTCTAGAGTTTGAGAGTTTGACCATCATTAGTGAGTAGTGACTATATTGTGTCTGAATTATTGACAGAATTTCTGATATTCATATGTACCAGGTTGTTTCTTAGAGTGGGAGCAGAGATGCAAGGGCTGCTAGTTCCGATGTGTAGGAGAAACTATCATTCATTTTGCATTTATCATTTTAAACGTTCTATATGTCTATCCTGGGCATGTGTTGAAGAACACAAGGAAGTATTAAATCACTCCTTCTTCTGAAGTTTGACTGGCAAAATTGGGCTAGAGTTACGAAATAAAATACAGGTTCCTAGTGAAATCTGAATTTCAGATACACAACCATAATTTATTGGAAATCCAAATTTAACTGGATATCTTCTATTATTCTCTGTTCTAGAACTCCACACTTCTAACATTTCTCATTCCTGTCTAAGCTCTTGTGTGTTTGGTTTTTGGCCATCGCTTTCACTGCTCTTTAAGCTCCCCCAGTAGAGTGGAGAGGTCTGTTTTCCCTCGTTTGGATTCCTACAGGCAGCGCAGGCCTGGCACAAGGTCATCACTAAGGAAGTGTTCACAGGGTGAAGGCGGTGGGTGCTGTTGAAGGAACCGGTAAAGCCTGTGGGATGAGAGAAGGAGCAGAGAGTGTTTTTGGGGTGGAGGCTCCCAGGAGGAGGCGGGACGGGCTGCGGTGCTGGGCGGATCCTCCTCCAGCTCCTGCTTGGAGGTCTCCAGAACAGGCTGGAGGCAGGGAGGGGGTCCCAAAAGCCTGGGGATGAGAAGGGGTTTTCCCGCATGGTCCCCCAGGCCCCCGTTCGCCTCAGGAAGACGGAGGATGAGCTCCTGGGCTGCTGGTGGTGGGCGTTGCGGGTGGGGCCGGTTAAGGTTCCCAGTGCCCGCACCCTGCCCAGGGAGCCCCGGATGGTGGCGTCGCTGTCAGTGTCTTCTCAGGAGGCTGCCCGTGTGACCGGATCCTTCGTGTCCCCACAGCACGTTTCTTGTGGCAGCTTAAGTTTGAATGTCATTTCTTCAATGGGACGGAGCGGGTGCGGTTGCTGGAAAGATGCATCTATAACCAAGAGGAGTCCGTGCGCTTCGACAGCGACGTGGGGGAGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCCGAGTACTGGAACAGCCAGAAGGACCTCCTGGAGCAGAGGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGCAGCGGCGAGGTGAGCGCGGCGCGGGGCGGGGCCTGAGTCCCTGTAAGCGGAGAATCTGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTAAGAAAGAGAGAGAGCGCGCCATCTGTGAGCATTTAGAATCCTCTCAATCCCCAGCAAGCAGTTCTGAGAGCACAGGTGTGTGTGTAGAGTGTGGATTTGTCTGGGTATGTCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTGATTCTTATCCTTGGAGACCTCTGTGGGGAGGTGACAAGGGAGGTGGGTGCTGGGGGCTGGAGAGAGAGGAGACCTTGATTGTCTCGGGTCCTTAGAGATGCGGGGAAGGGAAATGTAAGGGGTGTGTGATTGGGGTGAAGGTTTAGGGGAGGACTGCTGAGGGGTAAGGAAGGTTTGGGATAATGTGAGGAGGCCGGTTCCAGACTCTCCCTGGCATACACCCTTCGTGTAATCTCTGAAATAAAAGTGTGTGCTGTTTGTTTGTAAAAGCATTAGATTAATTTCTAGAGGAACTGAGTAGACCTCTGAGGCACTCCTGAAGCTTCTTTATATCTAAATTTCTTGCTAGTTTTTTGGGTTTTTTTAGTGTGTATATTTTTACATAGTAGAAATGACTGTGAAACTAACTTTTTGAATTAAAGTTTTAACACAGTTACTATTTTATTATAATGCTAGTTTTCTAGTAGTTACATATTATTCTTTTATATATAATAGTTATGACACAACTCACCTCACTTTCCCCTTTGTTGACCTTTATTATGACATTCACCAAAAGTTGAAAATGTATGTTTCTGGTTAATTTTTAATTTATATTTTTTATTTGTAATTGCTTTGAATTATTTTGACCTATTTATTGGCGGATTATAATTATTGCTCTAAGAATTCCCTATTGTATTTGGTAGGCAATGGACAATGATCTATGGTCTGATATCTTGAGGGCTTAGTATTTTTCTCAGTGACTTTGTGGGTTCTTTGTGCTATAAGATTATTAACACTTTATTGATATTTGATCCAGCATTTGCTTCAGTTTGTGGTTTGTATGTGGATTTTGAAAGTTCTTTTCCATGTTAAGAATTTGAACTTTTTATTTAATAAAATATATTGCAAAATTTTTATTAATGATTTACAATCCATCTTAAATCTGCCATTTTGTGGCATTGTTGTCTCCAGGTTCCTCCTTACTTCTAAAAAAAATAGTTGTATTTATTGAGAGTATGCTAGTGTTGGGGATTTTCCTGGGCATAAGCACCCCAAGTAACAAGTCCCAGACACTGCCTTAATCCAAATGTGACTCTGGAAAGAAAAATCATTTTACAATGATAGGCCTAATAATAATTATGCTTGTGTTACACGGGAGATGCATTGATCAGCTAAATGTAAATATAAGAACTTTCAAAACTAAAATGACATTCCCTAATCCTTTTCTCTGCTTCGGGACTCATGCTTTTCTAGGAACGTAAAAATTTGGAGAATCATTTCTGTCTGTCCCACATTCCCAGGGGCAGAACAATTTCTGTTTTGTTCTAAGGTGTGAGTGCATGGCAGTAGTATTCCTAAAATTCATACTCGGTTTCCTCATGTACCCAATTCTGTCCCTTTATCTATGCATATTTCTTTAAATCATATTTTTCTGTCATGGGGTACAAGGATGATAAATAGGTGCCAAGTGGAGCACCCAAGTGTGATGAGCGCCCTCACAGTGGAATGGAGTGAGAAGCTTTCTGACCTTATAAACTGAAGGCTATCTTCAGTCATTGTTTTATATATTTTACATGCATTAATCCTCATATAACCCCAAGAGGTAAATTAGTATAATTATCCTTCATTGTAGGTGACAAAGTTGAGACACAGAAGAATCAAAAAACTCTTCCAGGATCAACCAGTAAAAGGCAGACCTTGGATTCGAACCAAGCAACCTGGCTCAAATATCAGTTTTAATTACTACACTCTATACTTTCCAAGATTTGTAAACAGTTTGACAATGCATGCCAATTTAAAGCTATGAAGAAACAAACACAATTTTTCACAACACCTCTCAAATCTAATGGGTCCTCACTGTCAAGATTAAATTCCAGGCTGATGACACTGTAAGGTCACATGGCCAGCTGTGCTATAGGCCTGGTCAAGGTCAGAGCCTGGGTTTGCAGAGAAGCAGACACACAGCCAAACCAGGAGACTTACTCTGTCTTCCTGACTCATTCTCTCTACATTTGTTTTCTCCTAGTTGAGCCTAAGGTGACTGTGTATCCTTCAAAGACCCAGCCCCTGCAGCACCACAACCTCCTGGTCTGCTCTGTGAGTGGTTTCTATCCAGGCAGCATTGAAGTCAGGTGGTTCCGGAACGGCCAGGAAGAGAAGGCTGGGGTGGTGTCCACAGGCCTGATCCAGAATGGAGATTGGACCTTCCAGACCCTGGTGATGCTGGAAACAGTTCCTCGGAGTGGAGAGGTTTACACCTGCCAAGTGGAGCACCCAAGTGTGACGAGCCCTCTCACAGTGGAATGGAGTGAGCAGCTTTCTGACTTCATAAATTTCTCACCCACCAAGAAGGGGACTGTGCTAATCCCTGAGTGTCAGGTTTCTCCTCTCCCACACCCTATTTTCATTTGCTCCATGTTCTCATCTCCATCAGCACAGGTCACTGGGGGTAGCCCTGTAGGTGTTTCTAGAAACACCTGTACCTCCTGGAGAAGCAGTCTCACCTGCCAGGCAGGAGAGACTGTCCCTCTTTTGAACCTCCCCATGATTTCGCAGGTCAGGGTCACCCACTCTCCCCAGGCTCCAGGCCCTGCTTCTGGGTCTGAGACTGAGTTTCTGGTGCTGTTGCTCTGAGTTATTTTTTGTGATCTGGGAAGAGGAGAAGTGTAGGGGCCTACCTGACATGAGGGGAATCCAATCTCAGCCCTGCCTTTTATTAGCTCTGTCACTCTAGACAAACTACTTAGCTTCATTGAGTCTCAGGCTTTCTGTTGATCAGATGTTGAACGCTTGCCTTACATCAAGGCTGTAATATTTGAATGAGTTTGATGTCTGAACATCGTAACTGTTCAGCGTGATTTGAAATCCTTTTTTTCTCCTGAAATGGCTAGTTATTTTAATTCTTGTGGGGCAGGCTTCTGCCCCATTTTCAAAGCTCTGAATCTTAGAGTCTCAATTAAAGAGGTTCAATTTGGAATAAGCATCACTAAACCTGGATTCCTCTCTCAGGAGCACGGTCTGAATCTGCACAGAGCAAGATGCTGAGTGGAGTCGGGGGCTTCGTGCTGGGCCTGCTCTTCCTTGGGGCCGGGCTGTTCATCTACTTCAGGAATCAGAAAGGTGAGGAGCCTTTGGGAGCTGACTCTCTCCATAGGCTTTTCTGGAGGAGGAACCATGGTTTTGCTGAGAGGTTAGTTCTCAGTATACGAGTGGCCCTGAATAAAGCCTTTCTTTCCCCAAACGGCTCTAATGTCCTGCTAATCCAGAAATCATCAGTGCATGGTTACTATGTCAAAGCATAATAGCTTGTGGCCTGCAGAGACAAGAGGAAGGTTAACAAGTAGGGGTCCTTTTGTTTGAGATCCTGGAGCAAATTAAGGAAGAGCCACTAAGGCTAATGGAATTACACTGGATCCTGTGACAGACACTTCAGGCTTCATGGGTCACATGGTCTGTCTCTGCTCCTTTCTGCCCCTTTCTGCCCTGGTTGGTGCGGGTTGTGGTGTTAGAGAAATCTCAGGTGGGAGATCTGCGGCTGGGACATTGTGTTGGAGGACAGATTTGCTTCAATAACTTTTAAGTGTATTTCTTTTCCTCTTTTTCCCAGGACACTCTGGACTTCAGCCAACAGGTAATACCTTTTAATCCTCTTTTAGAAACAGACACAGTTTCCCTAGTGAGAGGTGAAGCCAGCTGGACTTCTGGGTGGGGTGGGGACTTGGAGAACTTTTCTTACAAGAGGTTTCTAAATGCCCCAATCAGTGCTTTGTAAAAACACACCAATAGGTTCTCTGTGGCTAGCTGGATGTTTGTAAAATGGACCAATTTGCACTCTGTAAAATGGACCAATCAGCATTCTTTAAAATGGACCAATCAGCACTTTTAAAATGGGCCAATCAGCACTCTTTAAAATGGACCAATCAGCACTCTTTAAAATGGACCAATCTGCAGGACATGGGCAGGGACAAATACAGGAATAAAAGCTGGCCACCCCAACCAGCAGTGGCAACCCACTCAGGTCCTCTTCCCTGCTGTGGAAGTTTTGTTCTTTTGATCTTCACAATAAATCTTGCTGCTGCCTACTCTTTGGGTCCCTGCCGCCTTTAAGAGCTGTAACACTCACTGTGAAGGTCTGCTTGAAGTCAGCGAGACCACAAACCCACTGGAAGGAACAAACTGCGGACACACTAGAATGATGGTAGAGGTGATAAGGCATGAGACAGAAATAATAGGAAAGACTTTGGATCCAAATTTCTGATCAGGCAATTTACACCAAAATTCCTCCTCTCCACTTAGAAAAGGGCTGTGCTCTGCGGGACTATTGGCTCAGGGGAGACTCAGGAACTTGTTTTTCTGCTTCCTGCAGTGCTCTCATCTGAGCCCTTGAAAGAGGAGAAAAGAAACTGTTAGTAGAGCCAGGTTGAAAACAACACTCTCCTCTGTCTTTTGCAGGATTCCTGAGCTGAAATGCAGATGACCACATTCAAGGAAGAACCTTCTGTCCCAGCTTTGCAGGA"

aligner = Align.PairwiseAligner()
aligner.match_score = 2
aligner.mismatch_score = -3
aligner.open_gap_score = -5
aligner.extend_gap_score = -2
aligner.query_right_open_gap_score = 0
aligner.query_left_open_gap_score = 0
aligner.query_right_extend_gap_score = 0
aligner.query_left_extend_gap_score = 0
alignments = aligner.align(ref_seq, query_seq)  # <= this is where the MemoryError occurs

for a in sorted(alignments):
    print(a)

This is Python 3 on Windows 10. The Biopython version is 1.72.

Edit: and here's the Traceback:

Traceback (most recent call last):
  File "C:/foo/temp.py", line 15, in <module>
    alignments = aligner.align(ref_seq, query_seq)  # <= this is where the MemoryError occurs
  File "C:\Program Files (x86)\Python36\lib\site-packages\Bio\Align\__init__.py", line 1352, in align
    score, paths = _aligners.PairwiseAligner.align(self, seqA, seqB)
MemoryError: Out of memory

Solution

  • In your case, where the aligner.align() call led to the MemoryError, it was likely due to a memory leak in an earlier Biopython version (since the core code is written in C, not python). This has been fixed in the later Biopython versions, so the solution is to upgrade Biopython to the latest version (now 1.77).


    In other cases, sorted(alignments) could be the cause. You create, in memory, a list of all the alignments, which can be massive. You can check the size of this potential list in advance by calling print(len(alignments)).

    Since alignments is a lazy iterator, you just iterate over the alignments 1 by 1 for a in alignments:, solving your memory issue, but I would still not recommend printing them all. Maybe you just want the first alignment first_alignment = next(alignments) or the first say 10 alignments:

    import itertools
    
    for a in itertools.islice(alignments, 10):
        print(a)