parsingbiopythonclustal

Print 50 sequences from each line using Clustal


I have a multiple sequence alignment (Clustal) file and I want to read this file and arrange sequences in such a way that it looks more clear and precise in order.

I am doing this from Biopython using an AlignIO object:

alignment = AlignIO.read("opuntia.aln", "clustal")

print "Number of rows: %i" % len(align)

for record in alignment:
    print "%s - %s" % (record.id, record.seq)

My output looks messy and long scrolling. What I want to do is print only 50 sequences in each line and continue until the end of the alignment file.

I wish to have output like this, from http://www.ebi.ac.uk/Tools/clustalw2/.


Solution

  • Br,

    I don't have biopython on this computer, so this isn't tested, but it should work:

    chunk_size = 50
    
    for i in range(0, alignment.get_alignment_length(), chunk_size):
        print ""
        for record in alignment:
            print "%s\t%s %i" % (record.name,  record.seq[i:i + chunk_size], i + chunk_size)
    

    Does the same trick as Eli's one - using range to set up an index to slice from then iterating over the record in the alignment for each slice.