pythoniobioinformaticsbiopythonformat-conversion

Multiple input files as single file output from biopython AlignIO


I'm writing a code to convert alignments from multiple files to phylip format, and then output all alignments to a single file. I can't seem to find a good way to have AlignIO.write() take multiple input files and produce a single output file. The following code works on a single file:

import glob
from Bio import AlignIO

path = "alignment?.nexus"

for filename in glob.glob(path):
    for alignment in AlignIO.parse(filename, "nexus"):
        AlignIO.write(alignment, "all_alignments", "phylip-relaxed")

Solution

  • You can use .write() to effectively append to the output file by writing to the file handle rather than a string file name:

    with open("all_alignments", "w") as output_handle: 
        for filename in glob.glob(path):
            for alignment in AlignIO.parse(filename, "nexus"):
                AlignIO.write(alignment, output_handle, "phylip-relaxed")
    

    The alternative would be to yield all alignments (or store them in a list or similar) and then call .write() once afterwards with the iterable and string file name (and format) as arguments:

    def yield_alignments():
        for filename in glob.glob(path):
            for alignment in AlignIO.parse(filename, "nexus"):
                yield alignment
    
    AlignIO.write(yield_alignments(), "all_alignments", "phylip-relaxed")
    

    The 2nd one feels more invasive to your current structure, but might be slightly more performant, on older Biopython versions at least.