pythonalignmentsequencebiopythondna-sequence

Read Clustal file in Python


I have a multiple sequence alignment (MSA) file derived from mafft in clustal format which I want to import into Python and save into a PDF file. I need to import the file and then highlight some specific words. I've tried to simply import the pdf of the MSA but after the highlight command doesn't work.

I need to print the file like this: enter image description here

CLUSTAL format alignment by MAFFT FFT-NS-i (v7.453)


Consensus       --------------------------------------acgttttcgatatttatgccat
AMP             tttatattttctcctttttatgatggaacaagtctgcgacgttttcgatatttatgccat
                                                      **********************

Consensus       atgtgcatgttgtaaggttgaaagcaaaaatgaggggaaaaaaaatgaggtttttaataa
AMP             atgtgcatgttgtaaggttgaaagcaaaaatgaggggaaaaaaaatgaggtttttaataa
                ************************************************************

Consensus       ctacacatttagaggtctaggaaataaaggagtattaccatggaaatgtatttccctaga
AMP             ctacacatttagaggtctaggaaataaaggagtattaccatggaaatgtaattccctaga
                ************************************************** *********

Consensus       tatgaaatattttcgtgcagttacaacatatgtgaatgaatcaaaatatgaaaaattgaa
AMP             tatgaaatattttcgtgcagttacaacatatgtgaatgaatcaaaatatgaaaaattgaa
                ************************************************************

Consensus       atataagagatgtaaatatttaaacaaagaaactgtggataatgtaaatgatatgcctaa
AMP             atataagagatgtaaatatttaaacaaagaaactgtggataatgtaaatgatatgcctaa
                ************************************************************

Consensus       ttctaaaaaattacaaaatgttgtagttatgggaagaacaaactgggaaagcattccaaa
AMP             ttctaaaaaattacaaaatgttgtagttatgggaagaacaaactgggaaagcattccaaa
                ************************************************************

Consensus       aaaatttaaacctttaagcaataggataaatgttatattgtctagaaccttaaaaaaaga
AMP             aaaatttaaacctttaagcaataggataaatgttatattgtctagaaccttaaaaaaaga
                ************************************************************

Consensus       agattttgatgaagatgtttatatcattaacaaagttgaagatctaatagttttacttgg
AMP             agattttgatgaagatgtttatatcattaacaaagttgaagatctaatagttttacttgg
                ************************************************************

Consensus       gaaattaaattactataaatgttttattataggaggttccgttgtttatcaagaattttt
AMP             gaaattaaattactataaatgttttattataggaggttccgttgtttatcaagaattttt
                ************************************************************

Consensus       agaaaagaaattaataaaaaaaatatattttactagaataaatagtacatatgaatgtga
AMP             agaaaagaaattaataaaaaaaatatattttactagaataaatagtacatatgaatgtga
                ************************************************************

Consensus       tgtattttttccagaaataaatgaaaatgagtatcaaattatttctgttagcgatgtata
AMP             tgtattttttccagaaataaatgaaaatgagtatcaaattatttctgttagcgatgtata
                ************************************************************

Consensus       tactagtaacaatacaacattgga----------------------------------
AMP             tactagtaacaatacaacattggattttatcatttataagaaaacgaataataaaatg
                ************************                                  

How can I import the alignment and print in the new PDF with the right alignment of the sequences.

Thanks

Multi.txt


Solution

  • Ok, figured out a way, not sure its the best one,

    nedd to install fpdf2 (pip install fpdf2)

    from io import StringIO
    
    from Bio import AlignIO  # Biopython 1.80
    
    from fpdf import FPDF # pip install fpdf2
     
    alignment = AlignIO.read("Multi.txt", "clustal")
    
    
    stri = StringIO()
        
    AlignIO.write(alignment, stri, 'clustal' )
    
    # print(stri.getvalue())
    
    stri_lines = [ i for i in stri.getvalue().split('\n')]
    
    # print(stri_lines)
    
    pdf = FPDF(orientation="P", unit="mm", format="A4")
     
    # Add a page
    pdf.add_page()
    
    pdf.add_font('FreeMono', '', 'FreeMono.ttf')
    
    pdf.set_font("FreeMono", size = 8)
    
    for x in stri_lines:
        pdf.cell(0, 5, txt = x, border = 0,  new_x="LMARGIN" , new_y="NEXT",  align = 'L', fill = False)
        
        # print(len(x))
    
    pdf.output("out.pdf") 
    

    output pdf out.pdf :

    enter image description here

    Not sure why the file Header is changed, think is something within Biopythion (!!! ???) you can check adding:

    with open('file_output.txt', 'w') as filehandler:
        AlignIO.write(alignment, filehandler, 'clustal')
    

    I had to place a tff FreeMono font (Mono spaced font in my script directory) see: pdf.add_font('FreeMono', '', 'FreeMono.ttf') otherwise the alignement won't be printed in the correct way Which fonts have the same width for every character?.

    Attached a png of my pdf. See that you can highlight it,

    using:

    pdf.set_fill_color(255, 255, 0)
    
    filling = False
    for x in stri_lines:
        if 'Consensus' in x:
            filling = True
        else:
            filling = False
        pdf.cell(0, 5, txt = x, border = 0,  new_x="LMARGIN" , new_y="NEXT",  align = 'L', fill = filling)
    

    or something similar you can highlight while printing:

    enter image description here